Riboswitches, methods for their use, and compositions for use with riboswitches

ABSTRACT

It has been discovered that certain natural mRNAs serve as metabolite-sensitive genetic switches wherein the RNA directly binds a small organic molecule. This binding process changes the conformation of the mRNA, which causes a change in gene expression by a variety of different mechanisms. Modified versions of these natural “riboswitches” (created by using various nucleic acid engineering strategies) can be employed as designer genetic switches that are controlled by specific effector compounds. Such effector compounds that activate a riboswitch are referred to herein as trigger molecules. The natural switches are targets for antibiotics and other small molecule therapies. In addition, the architecture of riboswitches allows actual pieces of the natural switches to be used to construct new non-immunogenic genetic control elements, for example the aptamer (molecular recognition) domain can be swapped with other non-natural aptamers (or otherwise modified) such that the new recognition domain causes genetic modulation with user-defined effector compounds. The changed switches become part of a therapy regimen—turning on, or off, or regulating protein synthesis. Newly constructed genetic regulation networks can be applied in such areas as living biosensors, metabolic engineering of organisms, and in advanced forms of gene therapy treatments.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Divisional application of U.S. application Ser.No. 12/492,866, filed Jun. 26, 2009, which is a Divisional applicationof U.S. application Ser. No. 10/669,162, filed Sep. 22, 2003, whichclaims benefit of U.S. Provisional Application No. 60/412,468, filedSep. 20, 2002. U.S. application Ser. No. 12/492,866, filed Jun. 26,2009, U.S. application Ser. No. 10/669,162, filed Sep. 22, 2003, andU.S. Provisional Application No. 60/412,468, filed Sep. 20, 2002, arehereby incorporated herein by reference in their entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with government support under Grants NIH GM48858and NIH GM559343 awarded by the National Institutes of Health, and GrantNSF EIA-0129939 awarded by the National Science Foundation. Thegovernment has certain rights in the invention.

REFERENCE TO SEQUENCE LISTING

The Sequence Listing submitted Feb. 23, 2011 as a text file named“YU_(—)6_(—)8408_AMD_AFD_Sequence_Listing.txt,” created on Feb. 17,2011, and having a size of 234,977 bytes is hereby incorporated byreference pursuant to 37 C.F.R. §1.52(e)(5).

FIELD OF THE INVENTION

The disclosed invention is generally in the field of gene expression andspecifically in the area of regulation of gene expression.

BACKGROUND OF THE INVENTION

Precision genetic control is an essential feature of living systems, ascells must respond to a multitude of biochemical signals andenvironmental cues by varying genetic expression patterns. Most knownmechanisms of genetic control involve the use of protein factors thatsense chemical or physical stimuli and then modulate gene expression byselectively interacting with the relevant DNA or messenger RNA sequence.Proteins can adopt complex shapes and carry out a variety of functionsthat permit living systems to sense accurately their chemical andphysical environments. Protein factors that respond to metabolitestypically act by binding DNA to modulate transcription initiation (e.g.the lac repressor protein; Matthews, K. S., and Nichols, J. C., 1998,Prog. Nucleic Acids Res. Mol. Biol. 58, 127-164) or by binding RNA tocontrol either transcription termination (e.g. the PyrR protein;Switzer, R. L., et al., 1999, Prog. Nucleic Acids Res. Mol. Biol. 62,329-367) or translation (e.g. the TRAP protein; Babitzke, P., andGollnick, P., 2001, J. Bacteriol. 183, 5795-5802). Protein factorsresponds to environmental stimuli by various mechanisms such asallosteric modulation or post-translational modification, and are adeptat exploiting these mechanisms to serve as highly responsive geneticswitches (e.g. see Ptashne, M., and Gann, A. (2002). Genes and Signals.Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.).

In addition to the widespread participation of protein factors ingenetic control, it is also known that RNA can take an active role ingenetic regulation. Recent studies have begun to reveal the substantialrole that small non-coding RNAs play in selectively targeting mRNAs fordestruction, which results in down-regulation of gene expression (e.g.see Hannon, G. J. 2002, Nature 418, 244-251 and references therein).This process of RNA interference takes advantage of the ability of shortRNAs to recognize the intended mRNA target selectively via Watson-Crickbase complementation, after which the bound mRNAs are destroyed by theaction of proteins. RNAs are ideal agents for molecular recognition inthis system because it is far easier to generate new target-specific RNAfactors through evolutionary processes than it would be to generateprotein factors with novel but highly specific RNA binding sites.

Although proteins fulfill most requirements that biology has for enzyme,receptor and structural functions, RNA also can serve in thesecapacities. For example, RNA has sufficient structural plasticity toform numerous ribozyme domains (Cech & Golden, Building a catalyticactive site using only RNA. In: The RNA World R. F. Gesteland, T. R.Cech, J. F. Atkins, eds., pp. 321-350 (1998); Breaker, In vitroselection of catalytic polynucleotides. Chem. Rev. 97, 371-390 (1997))and receptor domains (Osborne & Ellington, Nucleic acid selection andthe challenge of combinatorial chemistry. Chem. Rev. 97, 349-370 (1997);Hermann & Patel, Adaptive recognition by nucleic acid aptamers. Science287, 820-825 (2000)) that exhibit considerable enzymatic power andprecise molecular recognition. Furthermore, these activities can becombined to create allosteric ribozymes (Soukup & Breaker, Engineeringprecision RNA molecular switches. Proc. Natl. Acad. Sci. USA 96,3584-3589 (1999); Seetharaman et al., Immobilized riboswitches for theanalysis of complex chemical and biological mixtures. Nature Biotechnol.19, 336-341 (2001)) that are selectively modulated by effectormolecules.

These properties of RNA are consistent with speculation (Gold et al.,From oligonucleotide shapes to genomic SELEX: novel biologicalregulatory loops. Proc. Natl. Acad. Sci. USA 94, 59-64 (1997); Gold etal., SELEX and the evolution of genomes. Curr. Opin. Gen. Dev. 7,848-851 (1997); Nou & Kadner, Adenosylcobalamin inhibits ribosomebinding to btuB RNA. Proc. Natl. Acad. Sci. USA 97, 7190-7195 (2000);Gelfand et al., A conserved RNA structure element involved in theregulation of bacterial riboflavin synthesis genes. Trends Gen. 15,439-442 (1999); Miranda-Rios et al., A conserved RNA structure (thi box)is involved in regulation of thiamin biosynthetic gene expression inbacteria. Proc. Natl. Acad. Sci. USA 98, 9736-9741 (2001); Stormo & Ji,Do mRNAs act as direct sensors of small molecules to control theirexpression? Proc. Natl. Acad. Sci. USA 98, 9465-9467 (2001)) thatcertain mRNAs might employ allosteric mechanisms to provide geneticregulatory responses to the presence of specific metabolites. Although athiamine pyrophosphate (TPP)-dependent sensor/regulatory protein hadbeen proposed to participate in the control of thiamine biosyntheticgenes (Webb & Downs, Characterization of thiL, encodingthiamin-monophosphate kinase, in Salmonella typhimurium. J. Biol. Chem.272, 15702-15707 (1997)), no such protein factor has been shown toexist.

Transcription of the lysC gene of B. subtilis is repressed by highconcentrations of lysine (Kochhar, S., and Paulus, H. 1996, Microbiol.142:1635-1639; Mäder, U., et al., 2002, J. Bacteriol. 184:4288-4295;Patte, J. C. 1996. Biosynthesis of lysine and threonine. In: Escherichiacoli and Salmonella: Cellular and Molecular Biology, F. C. Neidhardt, etal., eds., Vol. 1, pp. 528-541. ASM Press, Washington, D.C.; Patte,J.-C., et al., 1998, FEMS Microbiol. Lett. 169:165-170), but that noprotein factor had been identified that served as the genetic regulator(Liao, H.-H., and Hseu, T.-H. 1998, FEMS Microbiol. Lett. 168:31-36).The lysC gene encodes aspartokinase II, which catalyzes the first stepin the metabolic pathway that converts L-aspartic acid into L-lysine(Belitsky, B. R. 2002. Biosynthesis of amino acids of the glutamate andaspartate families, alanine, and polyamines. In: Bacillus subtilis andits Closest Relatives: from Genes to Cells. A. L. Sonenshein, J. A.Hoch, and R. Losick, eds., ASM Press, Washington, D.C.).

BRIEF SUMMARY OF THE INVENTION

It has been discovered that certain natural mRNAs serve asmetabolite-sensitive genetic switches wherein the RNA directly binds asmall organic molecule. This binding process changes the conformation ofthe mRNA, which causes a change in gene expression by a variety ofdifferent mechanisms. Modified versions of these natural “riboswitches”(created by using various nucleic acid engineering strategies) can beemployed as designer genetic switches that are controlled by specificeffector compounds. Such effector compounds that activate a riboswitchare referred to herein as trigger molecules. The natural switches aretargets for antibiotics and other small molecule therapies. In addition,the architecture of riboswitches allows actual pieces of the naturalswitches to be used to construct new non-immunogenic genetic controlelements, for example the aptamer (molecular recognition) domain can beswapped with other non-natural aptamers (or otherwise modified) suchthat the new recognition domain causes genetic modulation withuser-defined effector compounds. The changed switches become part of atherapy regimen—turning on, or off, or regulating protein synthesis.Newly constructed genetic regulation networks can be applied in suchareas as living biosensors, metabolic engineering of organisms, and inadvanced forms of gene therapy treatments.

Disclosed are isolated and recombinant riboswitches, recombinantconstructs containing such riboswitches, heterologous sequences operablylinked to such riboswitches, and cells and transgenic organismsharboring such riboswitches, riboswitch recombinant constructs, andriboswitches operably linked to heterologous sequences. The heterologoussequences can be, for example, sequences encoding proteins or peptidesof interest, including reporter proteins or peptides. Preferredriboswitches are, or are derived from, naturally occurring riboswitches.

Also disclosed are chimeric riboswitches containing heterologous aptamerdomains and expression platform domains. That is, chimeric riboswitchesare made up an aptamer domain from one source and an expression platformdomain from another source. The heterologous sources can be from, forexample, different specific riboswitches or different classes ofriboswitches. The heterologous aptamers can also come fromnon-riboswitch aptamers. The heterologous expression platform domainscan also come from non-riboswitch sources.

Also disclosed are compositions and methods for selecting andidentifying compounds that can activate, deactivate or block ariboswitch. Activation of a riboswitch refers to the change in state ofthe riboswitch upon binding of a trigger molecule. A riboswitch can beactivated by compounds other than the trigger molecule and in ways otherthan binding of a trigger molecule. The term trigger molecule is usedherein to refer to molecules and compounds that can activate ariboswitch. This includes the natural or normal trigger molecule for theriboswitch and other compounds that can activate the riboswitch. Naturalor normal trigger molecules are the trigger molecule for a givenriboswitch in nature or, in the case of some non-natural riboswitches,the trigger molecule for which the riboswitch was designed or with whichthe riboswitch was selected (as in, for example, in vitro selection orin vitro evolution techniques). Non-natural trigger molecules can bereferred to as non-natural trigger molecules.

Deactivation of a riboswitch refers to the change in state of theriboswitch when the trigger molecule is not bound. A riboswitch can bedeactivated by binding of compounds other than the trigger molecule andin ways other than removal of the trigger molecule. Blocking of ariboswitch refers to a condition or state of the riboswitch where thepresence of the trigger molecule does not activate the riboswitch.

Also disclosed are compounds, and compositions containing suchcompounds, that can activate, deactivate or block a riboswitch. Alsodisclosed are compositions and methods for activating, deactivating orblocking a riboswitch. Riboswitches function to control gene expressionthrough the binding or removal of a trigger molecule. Compounds can beused to activate, deactivate or block a riboswitch. The trigger moleculefor a riboswitch (as well as other activating compounds) can be used toactivate a riboswitch. Compounds other than the trigger moleculegenerally can be used to deactivate or block a riboswitch. Riboswitchescan also be deactivated by, for example, removing trigger molecules fromthe presence of the riboswitch. A riboswitch can be blocked by, forexample, binding of an analog of the trigger molecule that does notactivate the riboswitch.

Also disclosed are compositions and methods for altering expression ofan RNA molecule, or of a gene encoding an RNA molecule, where the RNAmolecule includes a riboswitch, by bringing a compound into contact withthe RNA molecule. Riboswitches function to control gene expressionthrough the binding or removal of a trigger molecule. Thus, subjectingan RNA molecule of interest that includes a riboswitch to conditionsthat activate, deactivate or block the riboswitch can be used to alterexpression of the RNA. Expression can be altered as a result of, forexample, termination of transcription or blocking of ribosome binding tothe RNA. Binding of a trigger molecule can, depending on the nature ofthe riboswitch, reduce or prevent expression of the RNA molecule orpromote or increase expression of the RNA molecule.

Also disclosed are compositions and methods for regulating expression ofan RNA molecule, or of a gene encoding an RNA molecule, by operablylinking a riboswitch to the RNA molecule. A riboswitch can be operablylinked to an RNA molecule in any suitable manner, including, forexample, by physically joining the riboswitch to the RNA molecule or byengineering nucleic acid encoding the RNA molecule to include and encodethe riboswitch such that the RNA produced from the engineered nucleicacid has the riboswitch operably linked to the RNA molecule. Subjectinga riboswitch operably linked to an RNA molecule of interest toconditions that activate, deactivate or block the riboswitch can be usedto alter expression of the RNA.

Also disclosed are compositions and methods for regulating expression ofa naturally occurring gene or RNA that contains a riboswitch byactivating, deactivating or blocking the riboswitch. If the gene isessential for survival of a cell or organism that harbors it,activating, deactivating or blocking the riboswitch can in death, stasisor debilitation of the cell or organism. For example, activating anaturally occurring riboswitch in a naturally occurring gene that isessential to survival of a microorganism can result in death of themicroorganism (if activation of the riboswitch turns off or repressesexpression). This is one basis for the use of the disclosed compoundsand methods for antimicrobial and antibiotic effects.

Also disclosed are compositions and methods for regulating expression ofan isolated, engineered or recombinant gene or RNA that contains ariboswitch by activating, deactivating or blocking the riboswitch. Thegene or RNA can be engineered or can be recombinant in any manner. Forexample, the riboswitch and coding region of the RNA can beheterologous, the riboswitch can be recombinant or chimeric, or both. Ifthe gene encodes a desired expression product, activating ordeactivating the riboswitch can be used to induce expression of the geneand thus result in production of the expression product. If the geneencodes an inducer or repressor of gene expression or of anothercellular process, activation, deactivation or blocking of the riboswitchcan result in induction, repression, or de-repression of other,regulated genes or cellular processes. Many such secondary regulatoryeffects are known and can be adapted for use with riboswitches. Anadvantage of riboswitches as the primary control for such regulation isthat riboswitch trigger molecules can be small, non-antigenic molecules.

Also disclosed are compositions and methods for altering the regulationof a riboswitch by operably linking an aptamer domain to the expressionplatform domain of the riboswitch (which is a chimeric riboswitch). Theaptamer domain can then mediate regulation of the riboswitch through theaction of, for example, a trigger molecule for the aptamer domain.Aptamer domains can be operably linked to expression platform domains ofriboswitches in any suitable manner, including, for example, byreplacing the normal or natural aptamer domain of the riboswitch withthe new aptamer domain. Generally, any compound or condition that canactivate, deactivate or block the riboswitch from which the aptamerdomain is derived can be used to activate, deactivate or block thechimeric riboswitch.

Also disclosed are compositions and methods for inactivating ariboswitch by covalently altering the riboswitch (by, for example,crosslinking parts of the riboswitch or coupling a compound to theriboswitch). Inactivation of a riboswitch in this manner can resultfrom, for example, an alteration that prevents the trigger molecule forthe riboswitch from binding, that prevents the change in state of theriboswitch upon binding of the trigger molecule, or that prevents theexpression platform domain of the riboswitch from affecting expressionupon binding of the trigger molecule.

Also disclosed are methods of identifying compounds that activate,deactivate or block a riboswitch. For examples, compounds that activatea riboswitch can be identified by bringing into contact a test compoundand a riboswitch and assessing activation of the riboswitch. If theriboswitch is activated, the test compound is identified as a compoundthat activates the riboswitch. Activation of a riboswitch can beassessed in any suitable manner. For example, the riboswitch can belinked to a reporter RNA and expression, expression level, or change inexpression level of the reporter RNA can be measured in the presence andabsence of the test compound. As another example, the riboswitch caninclude a conformation dependent label, the signal from which changesdepending on the activation state of the riboswitch. Such a riboswitchpreferably uses an aptamer domain from or derived from a naturallyoccurring riboswitch. As can be seen, assessment of activation of ariboswitch can be performed with the use of a control assay ormeasurement or without the use of a control assay or measurement.Methods for identifying compounds that deactivate a riboswitch can beperformed in analogous ways.

Identification of compounds that block a riboswitch can be accomplishedin any suitable manner. For example, an assay can be performed forassessing activation or deactivation of a riboswitch in the presence ofa compound known to activate or deactivate the riboswitch and in thepresence of a test compound. If activation or deactivation is notobserved as would be observed in the absence of the test compound, thenthe test compound is identified as a compound that blocks activation ordeactivation of the riboswitch.

Also disclosed are biosensor riboswitches. Biosensor riboswitches areengineered riboswitches that produce a detectable signal in the presenceof their cognate trigger molecule. Useful biosensor riboswitches can betriggered at or above threshold levels of the trigger molecules.Biosensor riboswitches can be designed for use in vivo or in vitro. Forexample, biosensor riboswitches operably linked to a reporter RNA thatencodes a protein that serves as or is involved in producing a signalcan be used in vivo by engineering a cell or organism to harbor anucleic acid construct encoding the riboswitch/reporter RNA. An exampleof a biosensor riboswitch for use in vitro is a riboswitch that includesa conformation dependent label, the signal from which changes dependingon the activation state of the riboswitch. Such a biosensor riboswitchpreferably uses an aptamer domain from or derived from a naturallyoccurring riboswitch. Also disclosed are methods of detecting compoundsusing biosensor riboswitches. The method can include bringing intocontact a test sample and a biosensor riboswitch and assessing theactivation of the biosensor riboswitch. Activation of the biosensorriboswitch indicates the presence of the trigger molecule for thebiosensor riboswitch in the test sample.

Also disclosed are compounds made by identifying a compound thatactivates, deactivates or blocks a riboswitch and manufacturing theidentified compound. This can be accomplished by, for example, combiningcompound identification methods as disclosed elsewhere herein withmethods for manufacturing the identified compounds. For example,compounds can be made by bringing into contact a test compound and ariboswitch, assessing activation of the riboswitch, and, if theriboswitch is activated by the test compound, manufacturing the testcompound that activates the riboswitch as the compound.

Also disclosed are compounds made by checking activation, deactivationor blocking of a riboswitch by a compound and manufacturing the checkedcompound. This can be accomplished by, for example, combining compoundactivation, deactivation or blocking assessment methods as disclosedelsewhere herein with methods for manufacturing the checked compounds.For example, compounds can be made by bringing into contact a testcompound and a riboswitch, assessing activation of the riboswitch, and,if the riboswitch is activated by the test compound, manufacturing thetest compound that activates the riboswitch as the compound. Checkingcompounds for their ability to activate, deactivate or block ariboswitch refers to both identification of compounds previously unknownto activate, deactivate or block a riboswitch and to assessing theability of a compound to activate, deactivate or block a riboswitchwhere the compound was already known to activate, deactivate or blockthe riboswitch.

Also disclosed are methods for selecting, designing or deriving newriboswitches and/or new aptamers that recognize new trigger molecules.Such methods can involve production of a set of aptamer variants in ariboswitch, assessing the activation of the variant riboswitches in thepresence of a compound of interest, selecting variant riboswitches thatwere activated (or, for example, the riboswitches that were the mosthighly or the most selectively activated), and repeating these stepsuntil a variant riboswitch of a desired activity, specificity,combination of activity and specificity, or other combination ofproperties results. Also disclosed are riboswitches and aptamer domainsproduced by these methods.

The disclosed riboswitches, including the derivatives and recombinantforms thereof, generally can be from any source, including naturallyoccurring riboswitches and riboswitches designed de novo. Any suchriboswitches can be used in or with the disclosed methods. However,different types of riboswitches can be defined and some such sub-typescan be useful in or with particular methods (generally as describedelsewhere herein). Types of riboswitches include, for example, naturallyoccurring riboswitches, derivatives and modified forms of naturallyoccurring riboswitches, chimeric riboswitches, and recombinantriboswitches. A naturally occurring riboswitch is a riboswitch havingthe sequence of a riboswitch as found in nature. Such a naturallyoccurring riboswitch can be an isolated or recombinant form of thenaturally occurring riboswitch as it occurs in nature. That is, theriboswitch has the same primary structure but has been isolated orengineered in a new genetic or nucleic acid context. Chimericriboswitches can be made up of, for example, part of a riboswitch of anyor of a particular class or type of riboswitch and part of a differentriboswitch of the same or of any different class or type of riboswitch;part of a riboswitch of any or of a particular class or type ofriboswitch and any non-riboswitch sequence or component. Recombinantriboswitches are riboswitches that have been isolated or engineered in anew genetic or nucleic acid context.

Different classes of riboswitches refer to riboswitches that have thesame or similar trigger molecules or riboswitches that have the same orsimilar overall structure (predicted, determined, or a combination).Riboswitches of the same class generally, but need not, have both thesame or similar trigger molecules and the same or similar overallstructure.

Additional advantages of the disclosed method and compositions will beset forth in part in the description which follows, and in part will beunderstood from the description, or can be learned by practice of thedisclosed method and compositions. The advantages of the disclosedmethod and compositions will be realized and attained by means of theelements and combinations particularly pointed out in the appendedclaims. It is to be understood that both the foregoing generaldescription and the following detailed description are exemplary andexplanatory only and are not restrictive of the invention as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification, illustrate several embodiments of thedisclosed method and compositions and together with the description,serve to explain the principles of the disclosed method andcompositions.

FIGS. 1A and 1B show metabolite-dependent conformational changes in the202-nucleotide leader sequence of the btuB mRNA. FIG. 1A showsseparation of spontaneous RNA-cleavage products of the btuB leader usingdenaturing 10% polyacrylamide gel electrophoresis (PAGE). 5′-32p-labeledmRNA leader molecules (arrow) were incubated for 41 hr at 25° C. in 20mM MgCl₂, 50 mM Tris-HCl (pH 8.3 at 25° C.) in the presence (+) orabsence (−) of 20 μM of AdoCbl. Lanes containing RNAs that haveundergone no reaction, partial digest with alkali, and partial digestwith RNase T1 (G-specific cleavage) are identified by NR, ⁻OH, and T1,respectively. The location of product bands corresponding to cleavageafter selected guanosine residues are identified by filled arrowheads.Arrowheads labeled 1 through 8 identify eight of the nine locations thatexhibit effector-induced structure modulation, which experience anincrease or decrease in the rate of spontaneous RNA cleavage. The imagewas generated using a phosphorimager (Molecular Dynamics), and cleavageyields were quantitated by using ImageQuant software. FIG. 1B showssequence and secondary-structure model for the 202-nucleotide leadersequence of btuB mRNA (SEQ ID NO:1) in the presence of AdoCbl. Putativebase-paired elements are designated P1 through P9. Complementarynucleotides in the loops of P4 and P9 that have the potential to form apseudoknot are juxtaposed. Nine specific sites of structure modulationare identified by arrowheads. The asterisks demark the boundaries of theB₁₂ box (nucleotides 141-162). The coding region and the 38 nucleotidesthat reside immediately 5′ of the start codon (nucleotides 241-243) werenot included in the 202-nucleotide fragment. The 315-nucleotide fragmentincludes the 202-nucleotide fragment, the remaining 38 nucleotides ofthe leader sequence, and the first 75 nucleotides of the coding region.

FIGS. 2A and 2B show the btuB mRNA leader forms a saturable binding sitefor AdoCbl. FIG. 2A shows the dependence of spontaneous cleavage of btuBmRNA leader on the concentration of AdoCbl effector as represented bysite 1 (G23) and site 2 (U68). 5′-³²P-labeled mRNA leader molecules wereincubated, separated, and analyzed as described in the in the briefdescription of FIG. 1, and include identical control and marker lanes asindicated. Incubations contained concentrations of AdoCbl ranging from10 nM to 100 μM (lanes 1 though 8) or did not include AdoCbl (−). FIG.2B shows a composite plot of the fraction of RNA cleaved at sixlocations along the mRNA leader versus the logarithm of theconcentration (c) of AdoCbl. Fraction cleaved values were normalizedrelative to the highest and lowest cleavage values measured for eachlocation, including the values obtained upon incubation in the absenceof AdoCbl. The inset defines the symbols used for each of six sites,while the remaining three sites were excluded from the analysis due toweak or obscured cleavage bands. Filled and open symbols representincreasing and decreasing cleavage yields, respectively, upon increasingthe concentration of AdoCbl. The dashed line reflects a K_(D) of ˜300nM, as predicted by the concentration needed to generate half-maximalstructural modulation. Data plotted were derived from a single PAGEanalysis, of which two representative sections are depicted in FIG. 1A.

FIG. 3 shows the 202-nucleotide mRNA leader causes an unequaldistribution of AdoCbl in an equilibrium dialysis apparatus. I:Equilibration of tritiated effector was conducted in the absence of RNA.II: (step 1) Equilibration was conducted as in I, but with 200 pmoles ofmRNA leader added to chamber b; (step 2) 5,000 pmoles of unlabeledAdoCbl was added to chamber b. III: Equilibrations were conducted asdescribed in II, but wherein 5,000 pmoles of cyanocobalamin was added tochamber b. IV: (step 1) Equilibration was initiated as described in step1 of II; (steps 2 and 3) the solution in chamber a was replaced with 25μL of fresh equilibration buffer; (step 4) 5,000 pmoles of unlabeledAdoCbl was added to chamber b. The cpm ratio is the ratio of countsdetected in chamber b relative to that of a. The dashed line representsa cpm ratio of 1, which is expected if equal distribution of tritium isestablished.

FIGS. 4A and 4B show selective molecular recognition of effectors by thebtuB mRNA leader. FIG. 4A shows a chemical structure of AdoCbl (1) andvarious effector analogs (2 through 11, ref 30). FIG. 4B shows adetermination of analog binding by monitoring modulation of spontaneouscleavage of the 202-nucleotide btuB RNA leader. 5′-³²P-labeled mRNAleader molecules were incubated, separated, and analyzed as described inthe legend to FIG. 1A, and include identical control and marker lanes asindicated. The sections of three PAGE analyses encompassing site 2 (U68)are depicted. Below each image is plotted the amount of RNA cleaved(normalized with relation to the lowest and highest levels of cleavageat U68 in each gel) for each effector as indicated, or for no effector(−). The compound 11 (13-epi-AdoCbl) is an epimer of AdoCbl wherein theconfiguration at C13 is inverted, so that the e propionamide side chainis above the plane of the corrin ring; see Brown et al., Conformationalstudies of 5′-deoxyadenosyl-13-epicobalamin, a coenzymatically activestructural analog of coenzyme B₁₂. Polyhedron 17, 2213 (1998).

FIGS. 5A, 5B, 5C, 5D, 5E and 5F show mutations in the mRNA leader andtheir effects on AdoCbl binding and genetic control. FIG. 5A showssequence of the putative P5 element of the wild-type 202-nucleotide btuBleader exhibits AdoCbl-dependent modulation of structure as indicated bythe observed increase in spontaneous RNA cleavage at position U68 (10%denaturing PAGE gel). Assays were conducted in the absence (−) orpresence (+) of 5 μM of AdoCbl. The remaining lanes are as described inthe legend to FIG. 1A. The composite bar graph reflects the ability ofthe RNA to shift the equilibrium of AdoCbl in an equilibrium dialysisapparatus and the ability of a reporter gene (see ExperimentalProcedures) to be regulated by AdoCbl addition to a bacterial culture.(Left) Plotted is the cpm ratio derived by equilibrium dialysis, whereinchamber b contains the RNA. Details of the equilibrium dialysisexperiments are described in the brief description of FIG. 3. (Right)Plotted are the expression levels of β-galactosidase as determined fromcells grown in the absence (−) or presence (+) of 5 μM AdoCbl. Boxednumbers on the left and right, respectively, reflect the approximateK_(D) and the fold repression of β-galactosidase activity in thepresence of AdoCbl. N.D. designates not determined. FIG. 5B-5F showssequences and performance characteristics of various mutant leadersequences as indicated. Constructs were created as described in theExperimental Procedures section.

FIGS. 6A, 6B, 6C and 6D show metabolite binding by mRNAs. FIG. 6A showsTPP-dependent modulation of the spontaneous cleavage of 165 thiM RNA wasvisualized by polyacrylamide gel electrophoresis (PAGE). 5′ ³²P-labeledRNAs (arrow, 20 nM) were incubated for approximately 40 hr at 25° C. in20 mM MgCl₂, 50 mM Tris-HCl (pH 8.3 at 25° C.) in the presence (+) orabsence (−) of 100 μM TPP. NR, ⁻OH and T1 represent RNAs subjected to noreaction, partial digestion with alkali, or partial digestion with RNaseT1 (G-specific cleavage), respectively. Product bands representingcleavage after selected G residues are numbered and identified by filledarrowheads. The asterisk identifies modulation of RNA structureinvolving the Shine-Dalgarno (SD) sequence. Gel separations wereanalyzed using a phosphorimager (Molecular Dynamics) and quantitatedusing ImageQuant software. FIG. 6B shows a secondary-structure model of165 thiM (SEQ ID NO:2) as predicted by computer modeling (Zuker et al.,Algorithms and thermodynamics for RNA secondary structure prediction: apractical guide. In RNA Biochemistry and Biotechnology (eds.Barciszewski J. & Clark, B. F. C.) 11-43 (NATO ASI Series, KluwerAcademic Publishers, 1999); Mathews et al., Expanded sequence dependenceof thermodynamic parameters improves prediction of RNA secondarystructure. J. Mol. Biol. 288, 911-940 (1999)) and by the structureprobing data depicted in FIG. 6A. Spontaneous cleavage characteristicsare as noted in the inset. Unmarked nucleotides exhibit a constant butlow level of degradation. The truncated 91 thiM RNA (residues 1-91 ofSEQ ID NO:2) is boxed and the thi box element (Miranda-Rios et al., Aconserved RNA structure (thi box) is involved in regulation of thiaminbiosynthetic gene expression in bacteria. Proc. Natl. Acad. Sci. USA 98,9736-9741 (2001)) is shaded. Nucleotides enclosed in boxes identify analternative pairing, designated P8*. The RNA carries two mutations(G156A and U157C) relative to wild type that were introduced in anon-essential portion of the construct to form a restriction site forcloning, while all RNAs carry two 5′-terminal G residues to facilitatein vitro transcription. FIG. 6C shows TPP-dependent modulation of thespontaneous cleavage of 240 thiC RNA. Reactions were conducted andanalyzed as described in above for FIG. 6A. FIG. 6D shows asecondary-structure model of 240 thiC (SEQ ID NO:3). Base-pairedelements that are similar to those of thiM are labeled P1 through P5.The truncated RNA 111 thiC (residues 1-111 of SEQ ID NO:3) is boxed.Nucleotides enclosed in boxes identify an alternative pairing.

FIGS. 7A, 7B and 7C show the thiM and thiC mRNA leaders serve ashigh-affinity metabolite receptors. FIG. 7A shows the extent ofspontaneous modulation of RNA cleavage at several sites within 165 thiM(left) and 240 thiC (right) plotted for different concentrations (c) ofTPP. Arrows reflect the estimated concentration of TPP needed to attainhalf maximal modulation of RNA (apparent K_(D)). FIG. 7B shows thelogarithm of the apparent K_(D) values plotted for both RNAs with TPP,TP and thiamine as indicated. The boxed data was generated using TPPwith the truncated RNAs 91 thiM and 111 thiC. FIG. 7C shows thatpatterns of spontaneous cleavage of 165 thiM differ between thiamine andTPP ligands as depicted by PAGE analysis (left) and as reflected bygraphs (right) representing the relative phosphorimager counts for thethree lanes as indicated. Details for the RNA probing analysis aresimilar to those described above in connection with FIG. 6A. The graphswere generated by ImageQuant software.

FIGS. 8A, 8B, 8C and 8D show high sensitivity and selectivity of mRNAleaders for metabolite binding. FIG. 8A shows chemical structures ofseveral analogues of thiamine. TD is thiamine disulfide and THZ is4-methyl-5-β-hydroxyethylthiazole. FIG. 8B shows PAGE analysis of 165thiM RNA structure probing using TPP and various chemical analogues (40μM each) as indicated. Locations of significant structural modulationwithin the RNA spanning nucleotides ˜113 to ˜150 are indicated by openarrowheads. The asterisk identifies the site (C144) used to compare thenormalized fraction of RNA that is cleaved (bottom) in the presence ofspecific compounds. Details for the RNA probing analysis are similar tothose described above in connection with FIG. 6 a. FIG. 8C shows asummary of the features of TPP that are critical for molecularrecognition. FIG. 8D shows equilibrium dialysis using ³H-thiamine as atracer. Plotted are the ratios for tritium distribution in a two-chambersystem (a and b) that were established upon equilibration in thepresence of the RNA constructs in chamber b as indicated (see below fora description of the non-TPP-binding mutant M3). 100 μM TPP oroxythiamine were added to chamber a, as denoted, upon the start ofequilibration.

FIGS. 9A, 9B, 9C and 9D show mutational analysis of the structure andfunction of the thiM riboswitch. FIG. 9A shows mutations present inconstructs M1 through M8 relative to the 165 thiM RNA (SEQ ID NO:4). P8*is a putative base-paired element between portions (encircled) of the P1and P8 stems. FIGS. 9B and 9C show in vitro ligand-binding and geneticcontrol functions of the wild-type (WT), M1 and M2 RNAs as reflected byPAGE analysis of in-line probing experiments (10 μM TPP) and byβ-galactosidase expression assays. Labels on PAGE gels are as describedabove in connection with FIG. 6A. Bars represent the levels of geneexpression in the presence (+) and the absence (−) of TPP in the culturemedium. FIG. 9D is a summary of similar analyses of WT through M9 ispresented in table form. The SD status “n.d.” (not determined) indicateseither that the level of spontaneous cleavage detected in the absenceand presence of TPP is near the limit of detection (M6, M7 and M8) orthat the region adopts an atypical structure (M9) compared to WT.

FIG. 10 shows a construct for the selection of SAM-responsive ribozymes(SEQ ID NO:5). The hammerhead self-cleaving ribozyme and the SAM aptamerboth require proper formation of the bridge domain to exhibit function.Therefore, the selection is expected to permit ribozyme function onlywhen SAM or another binding-competent analog is present.

FIGS. 11A (SEQ ID NO:6 and SEQ ID NOs:378-382), 11B (SEQ ID NO:7 and SEQID NOs:383-385), 11C (SEQ ID NO:8 and SEQ ID NOs:386-387), 11D (SEQ IDNO:9 and SEQ ID NOs:388-389), 11E (SEQ ID NO:10), 11F (SEQ ID NO:11) and11G (SEQ ID NO:12 and SEQ ID NOs:390-397) show consensus sequences andputative secondary structures were derived by phylogenetic andbiochemical analyses as described for each riboswitch (see references).Nucleotides identified by a lower case a, c, t, or g, are conserved ingreater than 90% of the representative sequences, open circles identifynucleotide positions of variable sequence, and lines identify elementsthat are variable in sequence and length. Models are described asfollows: 11A) coenzyme B12 aptamer (Example 1); 11B) TPP aptamer(Example 2); 11C) FMN aptamer (Example 3); 11D) SAM aptamer (Example 7);11E) guanine aptamer (Example 6); 11F) adenine aptamer (Example 8); and11G) lysine aptamer Example 5). Letters R and Y represent purine andpyrimidine bases, respectively; K designates G or U; W designates A orU; H designates A, C, or U; D designates G, A, or U; N represents any ofthe four bases.

FIGS. 12A (SEQ ID NO:13), 12B and 12C show the regulation of the B.subtilis ribD mRNA by FMN. FIG. 12A shows the results of in-line probingassays. Internucleotide linkages identified with squares exhibitdecreased amounts of spontaneous cleavage when ribD is incubated in thepresence of FMN (indicating an increase in order for these nucleotides)relative to incubation in the absence of FMN. Circles identify linkagesthat exhibit consistently high levels of scission, which indicates theyare not modulated by presence of FMN. FIG. 12B shows a model for themechanism of ribD regulation. The ribD mRNA adopts anti-terminationconformation in the absence of FMN. Increased levels of FMN stabilize anRFN-FMN complex that permits formation of the terminator structure. FIG.12C shows the chemical structure and apparent dissociation constants forriboflavin and FMN.

FIGS. 13A (residues 1-91 of SEQ ID NO:2), 13B and 13C show theregulation of the E. coli thiM mRNA by TPP. FIG. 13A shows results ofin-line probing assays. Internucleotide linkages identified with squaresexhibit decreased amounts of spontaneous cleavage when thiM is incubatedin the presence of TPP compared to incubation in the absence of ligand.In contrast, linkages identified with hexagons exhibit increased amountsof cleavage when thiM is incubated with TPP compared to incubation inthe absence of ligand. The boxed nucleotides indicatepyrophosphate-recognition region (as described in text). FIG. 13B showsa model for the mechanism of thiM regulation. In the absence of TPP, theanti-SD sequence interacts with part of aptamer domain to formanti-anti-SD. As TPP is increased, aptamer-TPP complexes are formed andthe anti-SD favors pairing with the SD. FIG. 13C shows the chemicalstructure and apparent dissociation constants for thiamine and TPP.

FIGS. 14A, 14B and 14C show putative eukaryote riboswitches. FIG. 14Ashows the consensus TPP binding domain based on 100 bacteria and archaeaRNAs (SEQ ID NO:18 and SEQ ID NOs:398-399). Nucleotides shown as lowercase letters are most conserved (>90%). Open circles representnucleotide positions and domains that vary in sequence and length aredesignated var. The consensus model is similar to that reported recently(Rodionov et al., 2002). FIG. 14B the TPP-binding domain of A. thaliana(SEQ ID NO:14). Variations in O. sativa (nucleotides enclosed in acircle) (SEQ ID NO:15) and P. secunda (nucleotides enclosed in ahexagon) (SEQ ID NO:16) are shown. FIG. 14C shows a putative TPP-bindingdomain in the intron of N. crassa (SEQ ID NO:17).

FIG. 15 shows sequence alignments of eukaryotic domains related tobacterial TPP-dependent riboswitches, Eco1, Eco2, Cac1, Ncr1, Aor1,Fox1, Fso1, Ath1, Pse1, Osa1, which are represented by SEQ ID NOs:19-28respectively. Base paired stems are shaded in black and labeled asdefined in Example 2). The P3 sequences, which in eukaryotes aresignificantly expanded in length and number of base pairs, arerepresented as a stem-loop structure. The highly conserved nucleotidepositions in bacteria that were used to search for eukaryotic domainsare enclosed in a box. For each identified (ID) sequence, the positionof the conserved CUGAGA sequence within the given Genbank entry is givenalong with the accession identification, sequence name, and geneidentification. Additional protein annotations based on sequencesimilarity are shown in brackets. Methods: Riboswitch-like domains wereinitially identified by sequence similarity to bacterial sequences (Eco2and Cac) by a blastn search of Genbank using default parameters. Thesehits were verified and expanded by searching for degenerate matches tothe pattern (CTGAGA [200] ACYTGA [5]<<<GNTNNNNC>>>[5] CGNRGGRA) (SEQ IDNO:375). Angle brackets indicate base pairing and bracketed numbers arevariable gaps with constrained maximum lengths. All of the eukaryoticsequences have one or zero mismatches to this pattern except for one(Aor) that initially had three mismatches due to a single A insertion inthe final search element. This mutation was removed to simplify thealignment. Comparison of mRNA (M33643.1) and genomic (AB033416.1)sequences demonstrated that the F. oxysporum element is in an intron inthe 5′ UTR of the sti35 gene. Other fungal sequences (Ncr, Aor, and Fso)are flanked by consensus splicing sequences.

FIGS. 16A and 16B show the structural probing of the putativeTPP-riboswitch from Arabidopsis. FIG. 16A shows the fragmentationpattern of the 128-nucleotide RNA (arrow) of A. thaliana (FIG. 14B)which was generated by incubation in the absence (−) or presence (+) of100 μM TPP. T1, ⁻OH and NR identify RNAs that were partially digestedwith RNase T1 (cleaves 3′ to G residues), alkali, or were not reacted,respectively. Reactions were conducted as described in Example 2. FIG.16B shows the apparent K_(D) for TPP binding by the A. thaliana RNA.Fraction bound was determined by in-line probing as described inExamples 1-3.

FIG. 17 shows genetic structures thiamine biosynthetic genes andpossible mechanisms of riboswitch control. The location and mechanism ofthe E. coli and B. subtilis riboswitches are detailed in Examples 2 and6. The putative TPP riboswitch from P. secunda resides immediatelyupstream from the polyA tail in the cDNA clone of the THIC gene. Theputative TPP riboswitch domain in F. oxysporum is located in a 5′-UTRintron of the STI35 gene according to the genomic sequence but is absentin the cDNA clone.

FIGS. 18A and 18B show the L box—a highly conserved sequence andstructural domain is present in the 5′-UTRs of Gram-positive andGram-negative bacterial mRNAs that are related to lysine metabolism.Conserved portions of the L box sequence and secondary structure wereidentified by alignment of 28 representative mRNAs as noted. Basepairing potential representing P1 through P5 are enumerated and set offby boxes. Nucleotides shown as lower case letters are conserved ingreater than 80% of the examples. The asterisk identifies therepresentative (B. subtilis lysC 5′-UTR) that was examined in thisstudy. Gene names are as annotated in GenBank or were derived by proteinsequence similarity. Organism abbreviations are as follows: Bacillusanthracis (BA), Bacillus halodurans (BH), Bacillus subtilis (BS),Clostridium acetobutylicum (CA), Clostridium perfringens (CP),Escherichia coli (EC), Haemophilus influenzae (HI), Oceanobacillusiheyensis (OI), Pasteurella multocida (PM), Staphylococcus aureus (SA),Staphylococcus epidermidis (SE), Shigella flexneri (SF), Shewanellaoneidensis (SO), Thermatoga maritima (TM), Thermoanaerobactertengcongensis (TT), Vibrio cholerae (VC), Vibrio vulnificus (VV),Thermoanaerobacter tengcongensis (TE).

FIGS. 19A (SEQ ID NO:60 and SEQ ID NOs:400-408), 19B and 19C (SEQ IDNO:61) show the consensus L box motif from the lysC 5′-UTR of B.subtilis undergoes allosteric rearrangement in the presence of L-lysine.(A) Consensus sequence and structure of the L box domain as derivedusing a phylogeny of 31 representative sequences from prokaryotic andarchaeal organisms (FIG. 18) BA 0845, BA lysA, BA lysP, BH dapA, BHlysC, BH nhaC, BS lysC, BX lysC, CA lysA, CP lysA, CP lysP, EC lysC, HInhaC, OI dapA, OI nhaC, PM nhaC, SA lysC, SA lysP, SE lysC, SE lysP, SFlysC, SO lysC, SO nhaC, TM asd, TT lysA, TT pspF, VC lysC, VC nhaC, VCnhaC, VV lysC, VV nhaC, which are represented by SEQ ID NOs:29-59,respectively. Nucleotides depicted a lower case a, c, t, or g, arepresent in at least 80% of the representatives, open circles identifynucleotide positions of variable identity, and dashed lines denotevariable nucleotide identity and chain length. FIG. 19B shows sequence,secondary structure model, and lysine-induced structural modulation ofthe lysC 5′-UTR of B. subtilis. An additional 94 nucleotides (notdepicted) reside between nucleotide 237 and the AUG start codon.Structural modulation sites (nucleotides enclosed in squares) wereestablished using 237 lysC RNA by monitoring spontaneous RNA cleavage asdepicted in C. FIG. 19C shows in-line probing of the 237 lysC RNAreveals lysine-induced modulation of RNA structure. Patterns ofspontaneous cleavage, revealed by product separation using denaturing10% polyacrylamide gel electrophoresis (PAGE), are altered at four majorsites (denoted 1 through 4) in the presence (+) of 10 μM L-lysine (L)relative to that observed in the absence (−) of lysine. T1, ⁻OH and NRrepresent partial digest with RNase T1, partial digest with alkali, andno reaction, respectively. Selected bands in the T1 lane (G-specificcleavage) are identified by nucleotide position. See Methods forexperimental details.

FIGS. 20A, 20B, 20C, 20D and 20E show the molecular recognitioncharacteristics of the lysine aptamer and the use of caged lysine. FIG.20A shows the chemical structures of L-lysine, D-lysine and nineclosely-related analogs. Small circles represent chiral carbon centerswherein the enantiomeric configuration is defined for each compound.Encircled atoms identify chemical differences between L-lysine and theanalog depicted. FIG. 20B shows in-line probing analysis of the 179 lysCRNA in the absence (−) of ligand, or in the presence of 10 μM L-lysineor 100 μM of various analogs as indicated for each lane. For each lane,the relative extent of spontaneous cleavage at site 3 is compared tothat of the zone of constant cleavage immediately below this site, wherea cleavage ratio significantly below ˜1.5 reflects modulation. FIG. 20Cshows a schematic representation of dipeptide digestion by hydrochloricacid. All dipetide forms are expected to be incapable of binding thelysine aptamer (inactive), while lysine-containing dipeptides shouldinduce conformational changes in the aptamer (active) upon aciddigestion. FIG. 20D shows in-line probing analysis of the 179 lysC RNAin the absence of lysine (−) or in the presence of various amino acidsand dipeptides. Underlined lanes carry dipeptide preparations that werepretreated with HCl as depicted in a. FIG. 20E shows the fraction ofspontaneous cleavage at site 3 in d is plotted after normalization tothe extent of processing in the absence of added ligand.

FIGS. 21A, 21B, 21C and 21D show determination of the dissociationconstant and stoichiometry for L-lysine binding to the 179 lysC RNA.FIG. 21A shows in-line probing with increasing concentrations ofL-lysine ranging from 3 nM to 3 mM. Details are as defined for FIG. 19C.FIG. 20B shows a plot depicting the normalized fraction of RNAundergoing spontaneous cleavage versus the concentration of amino acidfor sites 1 through 3. The dashed line identifies the concentration ofL-lysine required to bring about half-maximal structural modulation,which indicates the apparent K_(D) for ligand binding. FIG. 20C showsthe 179 lysC RNA (10 μM) shifts the equilibrium of tritiated L-lysine(50 nM) in an equilibrium dialysis chamber. To investigate competitivebinding, unlabeled L- (L) and D-lysine (D), or L-ornithine (5) wereadded to a final concentration of 50 μM each to one chamber of apre-equilibrated assay as indicated. FIG. 21D shows a scatchard analysisof L-lysine binding by the 179 lysC RNA. The variable r represents theratio of bound ligand concentration versus the total RNA concentrationand the variable [L_(F)] represents the concentration of free ligand.

FIGS. 22A, 22B and 22C show the B. subtilis lysC riboswitch and itsmechanism for metabolite-induced transcription termination. FIG. 22Ashows a sequence and repressed-state model for the lysC riboswitchsecondary structure (SEQ ID NO:62). The encircled nucleotides identifythe putative anti-terminator interaction that could form in the absenceof L-lysine. Boxed nucleotides identify sites of disruption (M1) andcompensatory mutations for the terminator stem (M2) and for theterminator and anti-terminator stems (M3). Nucleotides enclosed insquares identify some of the positions where mutations exhibit lysCderepression that were reported previously (Vold et al. 1975; Lu et al.1992). FIG. 22B shows In vitro transcription assays conducted in theabsence (−) or presence (+) of 10 mM L-lysine or other analogs asindicated. FL and T identify the full-length and terminated transcripts,respectively. The percent of the terminated RNAs relative to the totalterminated and full-length transcripts are provided for each lane (%term.). FIG. 22C shows In vivo expression of a β-galactosidase reportergene fused to wild-type (WT), G39A and G40A mutant lysC 5′-UTRfragments. Media conditions are as follows: I, normal medium (0.27 mMlysine); II, minimal medium (0.012 mM); III, lysine-supplemented minimalmedium (1 mM); IV, lysine hydroxamate-supplemented (medium II plus 1 mMlysine hydroxamate) minimal media; V, thiosine-supplemented (medium IIplus 1 mM thiosine) minimal medium.

FIG. 23 shows that a highly conserved domain is present in the 5′-UTR ofcertain gram-positive and gram-negative bacterial mRNAs. Depicted is analignment of 32 representative mRNA domains from bacteria that conformto the G box consensus sequence BH1-guaA, BH2-[pbuG], BH3-purE,BH4-ssnA, BH5-[xpt], BS1-[pbuG], BS2-purE, BS3-xpt, BS4-yxjA, BS5-ydhL,CA1-uraA, CA2-[pbuG], CA3-guaB, CP1-xpt, CP2-uapC, CP3-guaB, CP4-add,FN1-purQ, LL1-xpt, LM1-[pbuG], LM2-[xpt], OI1-guaA, OI2-[pbuG],OI3-purE, OI4-[xpt], SA1-xpr, TSE1-[xpt], STA1-xpt, STPY1-xpt, STPN-xpt,TE1-[pbuG], VV1-add, which are represented by SEQ ID NOs:63-94respectively. Enclosed and enumerated regions identify base-pairingpotential of stems P1, P2, and P3, respectively. Nucleotides shown aslower case letters are conserved in greater than 90% of the examples.The asterisk identifies the representative (xpt-pbuX 5′-UTR) that wasexamined in this study. It is important to note that threerepresentatives (BS5, CP4 and VV1) that carry a C to U mutation in theconserved core (in the P3-P1 junction) appear to be adenine-specificriboswitches (unpublished observations). Gene names are as annotated inGenBank, the SubtiList database, or based on protein similarity searches(brackets). Organisms abbreviations are as follows: Bacillus halodurans(BH), Bacillus subtilis (BS), Clostridium acetobutylicum (CA),Clostridium perfringens (CP), Fusobacterium nucleatum (FN), Lactococcuslactis (LL), Listeria monocytogenes (LM), Oceanobacillus iheyensis (OI),Staphylococcus aureus (SA), Staphylococcus epidermidis (SE),Streptococcus agalactiae (STA), Streptococcus pyogenes (STPY),Streptococcus pneumoniae (STPN), Thermoanaerobacter tengcongensis (TE),and Vibrio vulnificus (VV).

FIGS. 24A, 24B and 24C show the G box RNA of the xpt-pbuX mRNA in B.subtilis responds allosterically to guanine FIG. 24A shows the consensussequence and secondary model for the G box RNA domain that resides inthe 5′ UTR of genes that are largely involved in purine metabolism (SEQID NO:95). Phylogenetic analysis is consistent with the formation of athree-stem (P1 through P3) junction. Nucleotides depicted shown as lowercase letters and capitals are present in greater than 90% and 80% of therepresentatives examined, respectively (FIG. 23). Encircled nucleotidesexhibit base complementation, which might indicate the formation of apseudoknot. FIG. 24B shows sequence and ligand-induced structuralalterations of the 5′-UTR of the xpt-pbuX transcriptional unit (SEQ IDNO:96). The putative anti-terminator interaction is represented by theboxes. Nucleotides that undergo structural alteration as determined byin-line probing (from C) are identified with squares. The 93 xptfragment (boxed) of the 201 xpt RNA retains guanine-binding function.Asterisks denote alterations to the RNA sequence that facilitate invitro transcription (5′ terminus) or that generate a restriction site(3′ terminus). Nucleotide numbers begin at the first nucleotide of thenatural transcription start site. The translation start codon begins atposition 186. FIG. 24C shows guanine and related purines selectivelyinduce structural modulation of the 93 xpt mRNA fragment. Precursor RNAs(Pre; 5′ ³²P-labeled) were subjected to in-line probing by incubationfor 40 hr in the absence (−) or presence of guanine, hypoxanthine,xanthine and adenine as indicated by G, H, X and A, respectively. Lanesdesignated NR, T1 and ⁻OH contain RNA that was not reacted, subjected topartial digestion with RNase T1 (G-specific cleavage), or subjected topartial alkaline digestion, respectively. Selected bands correspondingto G-specific cleavage are identified. Regions 1 through 4 identifymajor sites of ligand-induced modulation of spontaneous RNA cleavage.

FIGS. 25A and 25B show the 201 xpt mRNA Leader Binds Guanine with HighAffinity. FIG. 25A shows in-line probing reveals that spontaneous RNAcleavage of the 201 xpt RNA at four regions decreases with increasingguanine concentrations. Only those locations of the PAGE imagecorresponding to the four regions of modulation as indicated in FIG. 25Care depicted. Other details and notations are as described in the legendto FIG. 25C. FIG. 25B shows a plot depicting the normalized fraction ofRNA that experienced spontaneous cleavage versus the concentration ofguanine for modulated regions 1 through 4 in FIG. 25A. Fraction cleavedvalues were normalized to the maximum cleavage measured in the absenceof guanine and to the minimum cleavage measured in the presence of 10 μMguanine. The apparent K_(D) value (less than or equal to 5 nM) reflectsthe limits of detection for these assay conditions.

FIGS. 26A, 26B and 26C show a molecular discrimination by theguanine-binding aptamer of the xpt-pbuX mRNA. FIG. 26A shows thechemical structures and apparent K_(D) values for guanine, hypoxanthineand xanthine (active natural regulators of xpt-pbuX genetic expressionin B. subtilis) versus that of adenine (inactive). Differences inchemical structure relative to guanine are encircled. K_(D) values wereestablished as shown in FIG. 26 with the 201 xpt RNA. Numbers on guaninerepresent the positions of the ring nitrogen atoms. FIG. 26B showschemical structures and K_(D) values for various analogs of guaninereveal that all alterations of this purine cause a loss of bindingaffinity. Open circles identify K_(D) values that most likely aresignificantly higher than indicated, as concentrations of analog above500 μM were not examined in this analysis. The apparent K_(D) values ofG, H, X and A as indicated are plotted as triangles for comparison. FIG.26C shows a schematic representation of the molecular recognitionfeatures of the guanine aptamer in 201 xpt. Hydrogen bond formation atposition 9 of guanine is expected because guanosine (K_(D)>100 μM) andinosine (K_(D)>100 μM), which are 9-ribosyl derivatives of guanine andhypoxanthine, respectively, do not exhibit measurable binding (see FIG.27).

FIGS. 27A and 27B show confirmation of guanine binding specificity byequilibrium dialysis. FIG. 27A shows an equilibrium dialysis strategywas used to confirm that in vitro-transcribed 93 xpt RNAs bind toguanine and can discriminate against various analogs. Each data pointwas generated by adding ³H-guanine to chamber a, which is separated fromRNA and other analogs by a dialysis membrane with a molecular weightcut-off (MWCO) of 5,000 daltons. Left: If no guanine binding sites arepresent in chamber b, or if an excess of unlabeled competitor ispresent, then no shift in the distribution of tritium is expected.Right: If an excess of guanine-binding RNAs are present in chamber b,and if no competitor is present, then a substantial shift in thedistribution of tritium towards chamber b is expected. FIG. 27B showsthe 93 xpt RNA can shift the distribution of ³H-guanine in anequilibrium dialysis apparatus, while analogs of guanine are poorcompetitors. The plot depicts the fraction of counts per minute (cpm) oftritium in chamber b relative to the total amount of cpm counted fromboth chambers. A value of ˜0.5 is expected if no shift occurs, as is thecase when RNA is absent (none), or in the presence of excess unlabeledcompetitor (G). A value approaching 1 is expected if the majority of³H-guanine is bound by the RNA in chamber b in the absence (−) ofunlabeled analog, or in the presence of unlabeled analogs that do notserve as effective competitors under the assay conditions (100 nM³H-guanine, 300 nM RNA, 500 nM analog). Ino and Gua represents inosineand guanosine, respectively.

FIGS. 28A, 28B, 28C and 28D show the binding and genetic controlfunctions of variant guanine riboswitches. FIG. 28A shows mutations usedto examine the importance of various structural features of the guanineaptamer domain (SEQ ID NO:97). FIG. 28B shows examination of the bindingfunction of aptamer variants by equilibrium dialysis. WT designates thewild-type 93 xpt construct. Details are as described for FIG. 27. FIG.28C shows genetic modulation of a β-galactosidase reporter gene upon theintroduction of various purines as indicated. FIG. 28D shows regulationof β-galactosidase reporter gene expression by WT and mutants M1 throughM7. Open and filled bars represent enzyme activity generated whengrowing cells in the absence and presence of guanine, respectively.

FIGS. 29A, 29B and 29C show that riboswitches participate in fundamentalgenetic control. FIGS. 29A and 29B are schematic representations of theseven known riboswitches and the metabolites they sense. The secondarystructure models were obtained as follows: coenzyme B₁₂ (see Example 1);TPP (see Example 2); FMN (see Example 3), SAM (see Example 7); guanine(see Example 6); lysine (see Example 5); adenine (see Example 8).Coenzyme B₁₂ is depicted in exploded form wherein a, b and c designatecovalent attachment sites between fragments. FIG. 29C shows a geneticmap of B. subtilis riboswitch regulons and their positions on thebacterial chromosome. Genes are controlled by riboswitches as identifiedby matching numbers. All nomenclature is derived from the SubtiListdatabase release R16.1 (Moszer, I., et al., 1995, Microbiol. 141,261-268) except for metI and metC, which are recent designations (Auger,S., et al., 2002, Microbiol. 148, 507-518).

FIGS. 30A, 30B and 30C show the S Box is a structured RNA domain thatbinds SAM. (A) Consensus sequence and secondary-structure model of the Sbox domain derived from 107 bacterial representatives (SEQ ID NO:98 andSEQ ID NOs:409-410). Lower case letter and capital letter positionsidentify nucleotides whose identity as depicted is conserved in greaterthan 90% or 80% of the representative S box RNAs, respectively. R, Y,and N represent purine, pyrimidine, and any nucleotide, respectively. P1through P4 identify conserved base pairing. Enclosed nucleotidesidentify a putative pseudoknot interaction. FIG. 30B shows a sequenceand secondary structure model for the 251 yitJ mRNA fragment (SEQ IDNO:99). Sites of structural modulation upon introduction of SAM aredepicted as described. Nucleotide 1 corresponds to the putativetranscriptional start site. Asterisks identify nucleotides that wereadded to the construct to permit efficient transcription in vitro. Thefirst nucleotide of the AUG start codon is 212 (not shown). Othernotations are as described in a. FIG. 30C shows the spontaneous cleavagepatterns of 251 yitJ (˜1 nM 5′ ³²P-labeled) RNA incubated for ˜40 hr at25° C. in 50 mM Tris-HCl (pH 8.3 at 25° C.), 20 mM MgCl₂, 100 mM KCl,and without (−) or with methionine or SAM as indicated for each lane.NR, T1 and ⁻OH represent no reaction, partial digest with RNase T1, andpartial digest with alkali, respectively. Certain fragment bandscorresponding to T1 digestion (cleaves after G residues) are depicted.Arrowheads identify positions of significant modulation of spontaneouscleavage, and the numbered sites were used for quantitation (see FIG.31B). Experimental procedures are similar to those described in Examples1-3.

FIGS. 31A, 31B and 31C show the binding affinity and moleculardiscrimination by a SAM-binding RNA. FIG. 31A shows the chemicalstructures of various compounds used to probe the bindingcharacteristics of the SAM yitJ riboswitch. Other than methionine, eachcompound as depicted is coupled to an adenosyl moiety ([A]; inset)coupled via the 5′ carbon (as signified by R). FIG. 31B Left: The K_(D)of 251 yitJ for SAM was determined by plotting the normalized fractionof RNA cleaved at regions 1 through 6 (see FIG. 30C) versus thelogarithm of the concentration of SAM in molar units. The dashed lineindicates the concentration needed to induce half maximal modulation ofcleavage activity. Right: K_(D) values for SAM and various analogs asdetermined by this method. FIG. 31C shows molecular discriminationdetermined by equilibrium dialysis. Assays employed 100 nM ofS-adenosyl-L-methionine-methyl-³H (³H-SAM; 14.5 μCi mmol⁻¹; ˜7,000 cpm)added to side A of an equilibrium dialysis chamber (1, 2), and wereconducted in the absence (none) or the presence of 3 μM RNA on the Bside of the chamber as indicated. Equilibrations were carried out for˜10 hr in the absence (−) of unlabeled analogs, and then weresubsequently incubated in the presence of 25 μM unlabeled compounds(added to side B) as indicated. M1 is a variant of 124 yitJ that carriesdisruptive mutations in the junction between stems P1 and P2 (FIG. 32a). Line at a cpm ratio of 1 identifies the bar height expected if ashift in ³H-SAM has not occurred. Additional experimental details aresimilar to those described in Examples 1 and 2.

FIGS. 32A, 32B and 32C show the effects of RNA mutations on SAM bindingand genetic control. FIG. 32A shows the sequence and secondary structuremodel for the 124 yitJ RNA (SEQ ID NO:100). Mutations M1 through M9 weregenerated in plasmids containing fusions of the yitJ 5′-UTR upstreamfrom a lacZ reporter gene. Templates for preparation of mutant RNAs forin vitro studies were then created by PCR, and the mutant DNA constructswere integrated into the chromosome for in vivo studies. See Methods forexperimental details. FIG. 32B shows the analysis of SAM-bindingfunction by equilibrium dialysis in the presence of wild-type (WT) andmutant RNAs as denoted. Details are described in the legend to FIG. 31C,except that 300 nM RNA was used and all assays were conducted withoutthe addition of unlabeled analogs. FIG. 32C shows In vivo control ofβ-galactosidase expression in B. subtilis cells transformed with variousriboswitch constructs as indicated. β-galactosidase activities weremeasured as described in Example 2. Cells were grown in glucose minimalmedia in 0.75 μg mL⁻¹ methionine (−) 50 μg mL⁻¹ methionine (+). M6through M9 were not examined in vivo.

FIGS. 33A, 33B, 33C and 33D show metabolite-induced transcriptiontermination of several mRNAs that carry a SAM riboswitch. FIG. 33A showsIn vitro transcription using T7 RNA polymerase results in increasedtermination of four mRNA leader sequences. Reactions were conducted inthe absence (−) or presence (+) of 50 μM of the effector as indicatedfor each lane. For example, the metI template includes the 5′ UTR andcoding sequences through mRNA position 242, while the termination siteis expected to occur at position 189. Below each gel is indicated thepercentage of transcription termination (T) at the expected locationrelative the total amount of expected termination plus full length RNA(FL). FIGS. 33B-33D show sequence and structural model for the metIriboswitch in two structural states (SEQ ID NO:101). Residues shown inhexagons and squares correspond to the P1 (anti-anti-terminator) and theterminator stems, respectively. The encircled residues correspond to theanti-terminator stem. Sequences boxed in black define the location andidentity of mutations used to examine the proposed mechanism of geneticcontrol. Gel: Analysis of mutant metI riboswitches wherein disruptive(Ma, Mab and Mc) or the corresponding compensatory mutations (Mabc) havebeen inserted. The metI mutant templates and wild-type control template(WT) are identical to the templates used in A, except that the FLproduct is 220 nucleotides. Other notations are as describe in A.

FIGS. 34A and 34B show Bacilli species subtilis and anthrasis bind SAMwith different affinities. FIG. 34A shows structural modulation of theB. subtilis cysH aptamer as determined by in-line probing (SEQ IDNO:102). Inset: Apparent K_(D) values determined by monitoringstructural modulation over a range of SAM or SAM analog concentrations.Two G residues (asterisks) were included at the 5′ terminus of the RNAconstruct to facilitate in vitro transcription. Nucleotide numbers aregiven relative to the putative transcription start site. In-line probingwas conducted with an RNA extending to nucleotide 117, while theremainder of the RNA is shown to depict the putative transcriptionterminator stem. Experiments were similar to those described in FIG. 30Band FIG. 31B. See the legend for FIG. 30B for details. FIG. 34B showsstructural modulation of the B. subtilis cysH aptamer as determined byin-line probing (SEQ ID NO:103). The transcription start point of the B.anthracis cysH mRNA has not been determined, and so numbering ofnucleotides begins immediately after the two inserted G residues(asterisks). In-line probing was conducted with an RNA extending tonucleotide 112. See A for additional details.

FIGS. 35A, 35B and 35C show guanine- and adenine-specific riboswitches.FIG. 35A shows sequence and structural features of the twoguanine-specific (purE and xpt) and three adenine-specific aptamerdomains that are examined in this study BS2-purE, BS3-xpt, BS5-ydhL,CP4-add, VV1-add, which are represented by SEQ ID NOs:104-108,respectively. P1 through P3 identify the three base-paired stemscomprising the secondary structure of the aptamer domain. Lowercasenucleotides identify positions whose base identity is conserved ingreater than 90% of representatives in the phylogeny¹. The arrowidentifies a nucleotide within the conserved core of the aptamer that isa determinant of ligand specificity. BS, CP and VV designate B.subtilis, Clostridium perfringens and Vibrio vulnificus, respectively.FIG. 35B shows sequence and secondary structure of the xpt and ydhLaptamers (SEQ ID NO:109). Encircled nucleotides identify positionswithin the ydhL aptamer that differ from those in the xpt aptamer. Thesequence disclosed in FIG. 35C is SEQ ID NO:110. Nucleotides in xpt arenumbered as described in Example 6. Other notations are as described inA.

FIGS. 36A, 36B, 36C, 36D and 36E show the ligand specificity of five Gbox RNAs. (A through E) In-line probing assays for the conserved aptamerdomains as labeled. NR, T1 and ⁻OH identify marker lanes whereinprecursor RNAs (Pre) were not incubated, or were partially digested withRNase T1 or alkali, respectively. Selected bands corresponding to RNaseT1 digestion (cleavage 3′ relative to guanidyl residues) are labeled foreach RNA. RNAs were incubated for 40 hr in the absence of ligand (−), orin the presence of 1 μM guanine (G) or adenine (A). Large arrowheadsidentify sites of substantial change in cleavage pattern that is due tothe addition of a particular ligand. See Methods for additional details.

FIGS. 37A and 37B show the binding affinity of the ydhL aptamer foradenine. FIG. 37A shows the in-line probing assay for the 80 ydhL RNA atvarious concentrations of adenine. For each lane, sites 1 through 4 werequantitated and the fraction of RNA cleaved was used to determine theapparent K_(D). FIG. 37B shows a plot of the normalized fraction of RNAthat has undergone spontaneous cleavage at sites 1 through 4 versus theconcentration of adenine. See Example 8 for additional details.

FIGS. 38A and 38B show the specificity of molecular recognition by theadenine aptamer from ydhL. FIG. 38A Top: Chemical structures of adenine,guanine and other purine analogs that exhibit measurable binding to the80 ydhL RNA. Chemical changes relative to 2,6-DAP, which is thetightest-binding compound, are encircled. Bottom left: Plot of theapparent K_(D) values for various purines. Bottom right: Model for thechemical features on adenine that serve as molecular recognitioncontacts for ydhL. Note that the importance of N7 and N9 has not beendetermined. Encircled arrow indicated that a contact could exist if ahydrogen bond donor is appended to C2. FIG. 38B shows chemicalstructures of various purines that are not bound by the 80 ydhL RNA(K_(D) values poorer than 300 μM).

FIGS. 39A, 39B, 39C and 39D show interconversion of guanine- andadenine-specific aptamers. FIG. 39A Left: Plot of the normalizedfraction of wild-type 93 xpt RNA cleavage product for a given siteversus the logarithm of the concentration of ligand present duringincubation in an in-line probing assay. Cleavage products monitored formodulation correspond to site 3 (FIG. 37A). Right: Plot of the fractionof the total counts per minute (cpm) present in chamber B relative tothe total counts per minute from sides A and B of an equilibriumdialysis chamber. Value of ˜0.5 indicate an equal distribution of ligand(no binding) while values of ˜1 indicate that most of the ligand isbound to the RNA within side B of the chamber. (B, C, D) In-line probingplots and equilibrium dialysis plots for 93 xpt (C to U mutation), 80ydhL, and 80 ydhL (U to C mutation), respectively. Details are describein a, or are described in the Example 8.

FIGS. 40A, 40B, 40C, 40D and 40E show a model for the genetic control ofydhL by an adenine riboswitch and its function as a gene-activatingelement. FIG. 40A sequence of the adenine riboswitch from B. subtilisydhL and secondary structure models for the ‘ON’ and ‘OFF’ states forgene regulation (SEQ ID NO:111). FIG. 40B In vivo function of thewild-type ydhL riboswitch and of a variant form as determined by fusionto a β-galactosidase reporter gene.

FIGS. 41A-41BA show the sequence and types of riboswitches Bs01, Bs02,Bs03, Bs04, Bs05, Bs06, Bs07, Bs08, Bs09, Bs10, Bs11, Bh01, Bh02, Bh03,Bh04, Bh05, Oi01, Oi02, Oi03, Oi04, Oi05, Oi06, Oi07, Oi08, Oi09, Oi10,Oi11, Oi12, Oi13, Ca01, Ca02, Ca03, Ca04, Ca05, Ca06, Ca07, Cp01, Cp02,Lm01, Lm02, Lm03, Lm04, Lm05, Lm06, Lm07, Li01, Li02, Li03, Li04, Li05,Li06, Li07, Sa01, Sa02, Sa03, Sa04, Sc01, Ct01, Tt01, Tt02, Tt03, Fn01,Fn02, Dr01, Dr02, Xa01, Xc01, Se01, Se02, Gs01, Gs02, Ba01, Ba02, Ba03,Ba04, Ba05, Ba06, Ba07, Ba08, Ba09, Ba10, Ba11, Ba12, Ba13, Ba14, Ba15,Ba16, Ba17, Bc01, Bc02, Bc03, Bc04, Bc05, Bc06, Bc07, Bc08, Bc09, Bc10,Bc11, Bc12, Bc13, Bc14, Bc15, Bc16, Bc17, Bc18, Atu01, Atu02, Atu03,Atu04, Atu05, Atu06, Bha01, Bha02, Bha03, Bha04, Bsu01, Bja01, Bja02,Bja03, Bja04, Bja05, Bme01, Bme02, Bme03, Bme04, Cer01, Cer02, Cte01,Cte02, Cte03, Cte04, Cte05, Cac01, Cac02, Cpe01, Cpe02, Cpe03, Cpe04,Eco01, Fnu01, Lig01, Lmo01, Mlo01, Mlo02, Mlo03, Mlo04, Mlo05, Mlo06,Mle01, Mtu01, Mtu02, Pae01, Pae02, Pae03, Pae04, Ppu01, Ppu02, Ppu03,Ppu04, Rso01, Sme01, Sme02, Sme03, Sme04, Sme05, Sco01, Sco02, Sco03,Sco04, Sco05, Sfl01, Son01, Son02, Sti01, Sti02, Tma01, Tte01, Tte02,Vch01, Vvu01, Xac01, Xax01, Ype01, Aca01, Avi01, Bfr01, Bmg01, Lma01,Pfr01, Rca01, Rca02, Rca03, Rsp01, Sbi01, Sgi01, Svi01, Zmo01, Zmo02,NC_(—)002570.1/648448-648540, NC_(—)002570.1/650317-650406,NC_(—)002570.1/676483-676572, NC_(—)002570.1/806882-806965,NC_(—)002570.1/1593067-1592976, NC_(—)000964.1/693955-694038,NC_(—)000964.1/697886-697976, NC_(—)000964.1/2319120-2319031,NC_(—)000964.1/4004319-4004410, NC_(—)003030.1/1002184-1002270,NC_(—)003030.1/2904259-2904168, NC_(—)003030.1/2824539-2824454,NC_(—)003366.1/422828-422924, NC_(—)003366.1/512410-512323,NC_(—)003366.1/2617892-2617807, NC_(—)003454.1/1645257-1645173,NC_(—)002662.1/1159519-1159604, NC_(—)003210.1/610773-610679,NC_(—)003210.1/1958601-1958511, NC_(—)004193.1/760480-760571,NC_(—)004193.1/769695-769781, NC_(—)004193.1/786775-786863,NC_(—)004193.1/1103947-1104044, NC_(—)002745.1/430771-430861,NC_(—)004461.1/2432384-2432294, NC_(—)004116.1/1093950-1093860,NC_(—)002737.1/930757-930842, NC_(—)003028.1/1754791-1754878,NC_(—)003869.1/586372-586463, NC_(—)000964.1/626134-626051,NC_(—)003366.1/2870819-2870732, NC_(—)004460.1/504378-504467, Bha_LysC,Bha_dapA, Bha_nhaC, Bsu_LysC, Cac_lysA, Cpe_nhaC, Cpe_lysA, Cpe_lysP,Eco_lysC, Hin_nhaC, Oih_dapA, Oih_nhaC, Pmu_nhaC, Sau_lysC, Sau_lysP,Sep_lysC, Sep_lysP, Sfl_lysC, Son_lysC, Son_nhaC, Tma_asd, Tte_lysA,Tte_pspF, Vch_lysC, Vch_nhaC, Vch_nhaC, 2Vvu_lysC, Vvu_nhaC, Cons, Consand Consensus, which are represented by SEQ ID NOs:112-374,respectively.

DETAILED DESCRIPTION OF THE INVENTION

The disclosed methods and compositions can be understood more readily byreference to the following detailed description of particularembodiments and the Example included therein and to the Figures andtheir previous and following description.

Certain natural mRNAs serve as metabolite-sensitive genetic switcheswherein the RNA directly binds a small organic molecule. This bindingprocess changes the conformation of the mRNA, which causes a change ingene expression by a variety of different mechanisms. Modified versionsof these natural “riboswitches” (created by using various nucleic acidengineering strategies) can be employed as designer genetic switchesthat are controlled by specific effector compounds (referred to hereinas trigger molecules). The natural switches are targets for antibioticsand other small molecule therapies. In addition, the architecture ofriboswitches allows actual pieces of the natural switches to be used toconstruct new non-immunogenic genetic control elements, for example theaptamer (molecular recognition) domain can be swapped with othernon-natural aptamers (or otherwise modified) such that the newrecognition domain causes genetic modulation with user-defined effectorcompounds. The changed switches become part of a therapy regimen—turningon, or off, or regulating protein synthesis. Newly constructed geneticregulation networks can be applied in such areas as living biosensors,metabolic engineering of organisms, and in advanced forms of genetherapy treatments.

Messenger RNAs are typically thought of as passive carriers of geneticinformation that are acted upon by protein- or small RNA-regulatoryfactors and by ribosomes during the process of translation. It wasdiscovered that certain mRNAs carry natural aptamer domains and thatbinding of specific metabolites directly to these RNA domains leads tomodulation of gene expression. Natural riboswitches exhibit twosurprising functions that are not typically associated with naturalRNAs. First, the mRNA element can adopt distinct structural stateswherein one structure serves as a precise binding pocket for its targetmetabolite. Second, the metabolite-induced allosteric interconversionbetween structural states causes a change in the level of geneexpression by one of several distinct mechanisms. Riboswitches typicallycan be dissected into two separate domains: one that selectively bindsthe target (aptamer domain) and another that influences genetic control(expression platform). It is the dynamic interplay between these twodomains that results in metabolite-dependent allosteric control of geneexpression.

As disclosed herein, distinct classes of riboswitches have beenidentified and are shown to selectively recognize activating compounds(referred to herein as trigger molecules). For example, coenzyme B₁₂,thiamine pyrophosphate (TPP), and flavin mononucleotide (FMN) activateriboswitches present in genes encoding key enzymes in metabolic ortransport pathways of these compounds. The aptamer domain of eachriboswitch class conforms to a highly conserved consensus sequence andstructure. Thus, sequence homology searches can be used to identifyrelated riboswitch domains. Riboswitch domains have been discovered invarious organisms from bacteria, archaea, and eukarya.

One class of riboswitches that recognizes guanine and discriminatesagainst most other purine analogs has been discovered. RepresentativeRNAs that carry the consensus sequence and structural features ofguanine riboswitches are located in the 5′-untranslated region (UTR) ofnumerous genes of prokaryotes, where they control expression of proteinsinvolved in purine salvage and biosynthesis. Three representatives ofthis phylogenetic collection bind adenine with values for apparentdissociation constant (apparent K_(D)) that are several orders ofmagnitude better than for guanine. The preference for adenine is due toa single nucleotide substitution in the core of the riboswitch, whereineach representative most likely recognizes its corresponding ligand byforming a Watson/Crick base pair. In addition, the adenine-specificriboswitch associated with the ydhL gene of Bacillus subtilis functionsas a genetic ‘ON’ switch, wherein adenine binding causes a structuralrearrangement that precludes formation of an intrinsic transcriptionterminator stem. Guanine-sensing riboswitches are a class of RNA geneticcontrol elements that modulate gene expression in response to changingconcentrations of this compound.

It was discovered that the 5′-untranslated sequence of the Escherichiacoli btuB mRNA assumes a more proactive role in metabolic monitoring andgenetic control. The mRNA serves as a metabolite-sensing genetic switchby selectively binding coenzyme B₁₂ without the need for proteins. Thisbinding event establishes a distinct RNA structure that is likely to beresponsible for inhibition of ribosome binding and consequent reductionin synthesis of the cobalamin transport protein BtuB. This discovery,along with related observations described herein, supports thehypothesis that metabolic monitoring through RNA-metabolite interactionsis a widespread mechanism of genetic control.

RNA structure probing data indicate that the thiamine pyrophosphate(TPP) riboswitch operates as an allosteric sensor of its targetcompound, wherein binding of TPP by the aptamer domain stabilizes aconformational state within the aptamer and within the neighboringexpression platform that precludes translation. The diversity ofexpression platforms appears to be expansive. The thiM RNA uses aShine-Dalgarno (SD)-blocking mechanism to control translation. Incontrast, the thiC RNA controls gene expression both at transcriptionand translation, and therefore might make use of a somewhat more complexexpression platform that converts the TPP binding event into atranscription termination event and into inhibition of translation ofcompleted mRNAs.

A. General Organization of Riboswitch RNAs

Bacterial riboswitch RNAs are genetic control elements that are locatedprimarily within the 5′-untranslated region (5′-UTR) of the main codingregion of a particular mRNA. Structural probing studies (discussedfurther below) reveal that riboswitch elements are generally composed oftwo domains: a natural aptamer (T. Hermann, D. J. Patel, Science 2000,287, 820; L. Gold, et al., Annual Review of Biochemistry 1995, 64, 763)that serves as the ligand-binding domain, and an ‘expression platform’that interfaces with RNA elements that are involved in gene expression(e.g. Shine-Dalgarno (SD) elements; transcription terminator stems).These conclusions are drawn from the observation that aptamer domainssynthesized in vitro bind the appropriate ligand in the absence of theexpression platform (see Examples 2, 3 and 6). Moreover, structuralprobing investigations suggest that the aptamer domain of mostriboswitches adopts a particular secondary- and tertiary-structure foldwhen examined independently, that is essentially identical to theaptamer structure when examined in the context of the entire 5′ leaderRNA. This implies that, in many cases, the aptamer domain is a modularunit that folds independently of the expression platform (see Examples2, 3 and 6).

Ultimately, the ligand-bound or unbound status of the aptamer domain isinterpreted through the expression platform, which is responsible forexerting an influence upon gene expression. The view of a riboswitch asa modular element is further supported by the fact that aptamer domainsare highly conserved amongst various organisms (and even betweenkingdoms as is observed for the TPP riboswitch), (N. Sudarsan, et al.,RNA 2003, 9, 644) whereas the expression platform varies in sequence,structure, and in the mechanism by which expression of the appended openreading frame is controlled. For example, ligand binding to the TPPriboswitch of the tenA mRNA of B. subtilis causes transcriptiontermination (A. S. Mironov, et al., Cell 2002, 111, 747). Thisexpression platform is distinct in sequence and structure compared tothe expression platform of the TPP riboswitch in the thiM mRNA from E.coli, wherein TPP binding causes inhibition of translation by a SDblocking mechanism (see Example 2). The TPP aptamer domain is easilyrecognizable and of near identical functional character between thesetwo transcriptional units, but the genetic control mechanisms and theexpression platforms that carry them out are very different.

Aptamer domains for riboswitch RNAs typically range from ˜70 to 170 ntin length (FIG. 11). This observation was somewhat unexpected given thatin vitro evolution experiments identified a wide variety of smallmolecule-binding aptamers, which are considerably shorter in length andstructural intricacy (T. Hermann, D. J. Patel, Science 2000, 287, 820;L. Gold, et al., Annual Review of Biochemistry 1995, 64, 763; M.Famulok, Current Opinion in Structural Biology 1999, 9, 324). Althoughthe reasons for the substantial increase in complexity and informationcontent of the natural aptamer sequences relative to artificial aptamersremains to be proven, this complexity is most likely required to formRNA receptors that function with high affinity and selectivity. ApparentK_(D) values for the ligand-riboswitch complexes range from lownanomolar to low micromolar. It is also worth noting that some aptamerdomains, when isolated from the appended expression platform, exhibitimproved affinity for the target ligand over that of the intactriboswitch. (˜10 to 100-fold) (see Example 2). Presumably, there is anenergetic cost in sampling the multiple distinct RNA conformationsrequired by a fully intact riboswitch RNA, which is reflected by a lossin ligand affinity. Since the aptamer domain must serve as a molecularswitch, this might also add to the functional demands on naturalaptamers that might help rationalize their more sophisticatedstructures.

B. Riboswitch Regulation of Transcription Termination in Bacteria

Bacteria primarily make use of two methods for termination oftranscription. Certain genes incorporate a termination signal that isdependent upon the Rho protein, (J. P. Richardson, Biochimica etBiophysica Acta 2002, 1577, 251). while others make use ofRho-independent terminators (intrinsic terminators) to destabilize thetranscription elongation complex (I. Gusarov, E. Nudler, Molecular Cell1999, 3, 495; E. Nudler, M. E. Gottesman, Genes to Cells 2002, 7, 755).The latter RNA elements are composed of a GC-rich stem-loop followed bya stretch of 6-9 uridyl residues. Intrinsic terminators are widespreadthroughout bacterial genomes (F. Lillo, et al., 2002, 18, 971), and aretypically located at the 3′-termini of genes or operons. Interestingly,an increasing number of examples are being observed for intrinsicterminators located within 5′-UTRs.

Amongst the wide variety of genetic regulatory strategies employed bybacteria there is a growing class of examples wherein RNA polymeraseresponds to a termination signal within the 5′-UTR in a regulatedfashion (T. M. Henkin, Current Opinion in Microbiology 2000, 3, 149).During certain conditions the RNA polymerase complex is directed byexternal signals either to perceive or to ignore the termination signal.Although transcription initiation might occur without regulation,control over mRNA synthesis (and of gene expression) is ultimatelydictated by regulation of the intrinsic terminator. Presumably, one ofat least two mutually exclusive mRNA conformations results in theformation or disruption of the RNA structure that signals transcriptiontermination. A trans-acting factor, which in some instances is a RNA (F.J. Grundy, et al., Proceedings of the National Academy of Sciences ofthe United States of America 2002, 99, 11121; T. M. Henkin, C. Yanofsky,Bioessays 2002, 24, 700) and in others is a protein (J. Stulke, Archivesof Microbiology 2002, 177, 433), is generally required for receiving aparticular intracellular signal and subsequently stabilizing one of theRNA conformations. Riboswitches offer a direct link between RNAstructure modulation and the metabolite signals that are interpreted bythe genetic control machinery. A brief overview of the FMN riboswitchfrom a B. subtilis mRNA is provided below to illustrate this mechanism.

It was discovered that certain mRNAs involved in thiamine biosynthesisbind to thiamine (vitamin B₁) or its bioactive pyrophosphate derivative(TPP) without the participation of protein factors. The mRNA-effectorcomplex adopts a distinct structure that sequesters the ribosome-bindingsite and leads to a reduction in gene expression. Thismetabolite-sensing mRNA system provides an example of a genetic“riboswitch” (referred to herein as a riboswitch) whose origin mightpredate the evolutionary emergence of proteins. It has been discoveredthat the mRNA leader sequence of the btuB gene of Escherichia coli canbind coenzyme B₁₂ selectively, and that this binding event brings abouta structural change in the RNA that is important for genetic control(see Example 1). It was also discovered that mRNAs that encode thiaminebiosynthetic proteins also employ a riboswitch mechanism (see Example2).

It was also discovered that the 5′-UTR of the lysC gene of Bacillussubtilis carries a conserved RNA element that serves as alysine-responsive riboswitch. The ligand-binding domain of theriboswitch binds to L-lysine with an apparent dissociation constant(K_(D)) of approximately 1 μM, and exhibits a high level of moleculardiscrimination against closely related analogs including D-lysine andornithine. This widespread class of riboswitches serves as a target forthe antimicrobial agent thiosine.

It was also discovered that the xpt-pbuX operon (Christiansen, L. C., etal., 1997, J. Bacteriol. 179, 2540-2550) is controlled by a riboswitchthat exhibits high affinity and high selectivity for guanine. This classof riboswitches is present in the 5′-untranslated region (5′-UTR) offive transcriptional units in B. subtilis, including that of the 12-genepur operon. Direct binding of guanine by mRNAs serves as a criticaldeterminant of metabolic homeostasis for purine metabolism in certainbacteria. Furthermore, the discovered classes of riboswitches, whichrespond to seven distinct target molecules, control at least 68 genes inBacillus subtilis that are of fundamental importance to centralmetabolic pathways.

It was discovered that a highly conserved RNA domain termed the S boxserves as a selective aptamer for SAM. Allosteric modulation ofsecondary and tertiary structures are induced upon SAM binding to theaptamer domain, and these structural changes are responsible forinducing termination of mRNA transcription.

A variant class of riboswitches that responds to adenine is alsodisclosed. These riboswitches carry an aptamer domain that correspondsclosely in sequence and secondary structure to the guanine aptamer.However, each representative of the adenine sub-class of riboswitchescarries a C to U mutation in the conserved core of the aptamer,indicating that this residue is involved in metabolite recognition. Theidentity of this single nucleotide determines the binding specificitybetween guanine and adenine, which provides an example of how complexriboswitch structures can be mutated to recognize new metabolitetargets.

Although the specific natural riboswitches disclosed herein are thefirst examples of mRNA elements that control genetic expression bymetabolite binding, it is expected that this genetic control strategy iswidespread in biology. It has been suggested (White III, Coenzymes asfossils of an earlier metabolic state. J. Mol. Evol. 7, 101-104 (1976);White III, In: The Pyridine Nucleotide Coenzymes. Acad. Press, NY pp.1-17 (1982); Benner et al., Modern metabolism as a palimpsest of the RNAworld. Proc. Natl. Acad. Sci. USA 86, 7054-7058 (1989)) that TPP,coenzyme B₁₂ and FMN emerged as biological cofactors during the RNAworld (Joyce, The antiquity of RNA-based evolution. Nature 418, 214-221(2002)). If these metabolites were being biosynthesized and used beforethe advent of proteins, then certain riboswitches might be modernexamples of the most ancient form of genetic control. A search ofgenomic sequence databases has revealed that sequences corresponding tothe TPP aptamer exist in organisms from bacteria, archaea andeukarya—largely without major alteration. Although newmetabolite-binding mRNAs are likely to emerge as evolution progresses,it is possible that the known riboswitches are molecular fossils fromthe RNA world.

Disclosed are mRNA elements that have been identified in fungi and inplants that match the consensus sequence and structure of thiaminepyrophosphate-binding domains of prokaryotes. In Arabidopsis, theconsensus motif resides in the 3′-UTR of a thiamine biosynthetic gene,and the isolated RNA domain binds the corresponding coenzyme in vitro.These results indicate that metabolite-binding mRNAs are involved ineukaryotic gene regulation and that some riboswitches might berepresentatives of an ancient form of genetic control.

It is to be understood that the disclosed method and compositions arenot limited to specific synthetic methods, specific analyticaltechniques, or to particular reagents unless otherwise specified, and,as such, can vary. It is also to be understood that the terminology usedherein is for the purpose of describing particular embodiments only andis not intended to be limiting.

Materials

Disclosed are materials, compositions, and components that can be usedfor, can be used in conjunction with, can be used in preparation for, orare products of the disclosed methods and compositions. These and othermaterials are disclosed herein, and it is understood that whencombinations, subsets, interactions, groups, etc. of these materials aredisclosed that while specific reference to each of various individualand collective combinations and permutation of these compounds can notbe explicitly disclosed, each is specifically contemplated and describedherein. For example, if a riboswitch or aptamer domain is disclosed anddiscussed and a number of modifications that can be made to a number ofmolecules including the riboswitch or aptamer domain are discussed, eachand every combination and permutation of riboswitch or aptamer domainand the modifications that are possible are specifically contemplatedunless specifically indicated to the contrary. Thus, if a class ofmolecules A, B, and C are disclosed as well as a class of molecules D,E, and F and an example of a combination molecule, A-D is disclosed,then even if each is not individually recited, each is individually andcollectively contemplated. Thus, in this example, each of thecombinations A-E, A-F, B-D, B-E, B-F, C-D, C-E, and C-F are specificallycontemplated and should be considered disclosed from disclosure of A, B,and C; D, E, and F; and the example combination A-D. Likewise, anysubset or combination of these is also specifically contemplated anddisclosed. Thus, for example, the sub-group of A-E, B-F, and C-E arespecifically contemplated and should be considered disclosed fromdisclosure of A, B, and C; D, E, and F; and the example combination A-D.This concept applies to all aspects of this application including, butnot limited to, steps in methods of making and using the disclosedcompositions. Thus, if there are a variety of additional steps that canbe performed it is understood that each of these additional steps can beperformed with any specific embodiment or combination of embodiments ofthe disclosed methods, and that each such combination is specificallycontemplated and should be considered disclosed.

A. Riboswitches

Riboswitches are expression control elements that are part of the RNAmolecule to be expressed and that change state when bound by a triggermolecule. Riboswitches typically can be dissected into two separatedomains: one that selectively binds the target (aptamer domain) andanother that influences genetic control (expression platform domain). Itis the dynamic interplay between these two domains that results inmetabolite-dependent allosteric control of gene expression. Disclosedare isolated and recombinant riboswitches, recombinant constructscontaining such riboswitches, heterologous sequences operably linked tosuch riboswitches, and cells and transgenic organisms harboring suchriboswitches, riboswitch recombinant constructs, and riboswitchesoperably linked to heterologous sequences. The heterologous sequencescan be, for example, sequences encoding proteins or peptides ofinterest, including reporter proteins or peptides. Preferredriboswitches are, or are derived from, naturally occurring riboswitches.

The disclosed riboswitches, including the derivatives and recombinantforms thereof, generally can be from any source, including naturallyoccurring riboswitches and riboswitches designed de novo. Any suchriboswitches can be used in or with the disclosed methods. However,different types of riboswitches can be defined and some such sub-typescan be useful in or with particular methods (generally as describedelsewhere herein). Types of riboswitches include, for example, naturallyoccurring riboswitches, derivatives and modified forms of naturallyoccurring riboswitches, chimeric riboswitches, and recombinantriboswitches. A naturally occurring riboswitch is a riboswitch havingthe sequence of a riboswitch as found in nature. Such a naturallyoccurring riboswitch can be an isolated or recombinant form of thenaturally occurring riboswitch as it occurs in nature. That is, theriboswitch has the same primary structure but has been isolated orengineered in a new genetic or nucleic acid context. Chimericriboswitches can be made up of, for example, part of a riboswitch of anyor of a particular class or type of riboswitch and part of a differentriboswitch of the same or of any different class or type of riboswitch;part of a riboswitch of any or of a particular class or type ofriboswitch and any non-riboswitch sequence or component. Recombinantriboswitches are riboswitches that have been isolated or engineered in anew genetic or nucleic acid context.

Different classes of riboswitches refer to riboswitches that have thesame or similar trigger molecules or riboswitches that have the same orsimilar overall structure (predicted, determined, or a combination).Riboswitches of the same class generally, but need not, have both thesame or similar trigger molecules and the same or similar overallstructure.

Also disclosed are chimeric riboswitches containing heterologous aptamerdomains and expression platform domains. That is, chimeric riboswitchesare made up an aptamer domain from one source and an expression platformdomain from another source. The heterologous sources can be from, forexample, different specific riboswitches, different types ofriboswitches, or different classes of riboswitches. The heterologousaptamers can also come from non-riboswitch aptamers. The heterologousexpression platform domains can also come from non-riboswitch sources.

Riboswitches can be modified from other known, developed ornaturally-occurring riboswitches. For example, switch domain portionscan be modified by changing one or more nucleotides while preserving theknown or predicted secondary, tertiary, or both secondary and tertiarystructure of the riboswitch. For example, both nucleotides in a basepair can be changed to nucleotides that can also base pair. Changes thatallow retention of base pairing are referred to herein as base pairconservative changes.

Modified or derivative riboswitches can also be produced using in vitroselection and evolution techniques. In general, in vitro evolutiontechniques as applied to riboswitches involve producing a set of variantriboswitches where part(s) of the riboswitch sequence is varied whileother parts of the riboswitch are held constant. Activation,deactivation or blocking (or other functional or structural criteria) ofthe set of variant riboswitches can then be assessed and those variantriboswitches meeting the criteria of interest are selected for use orfurther rounds of evolution. Useful base riboswitches for generation ofvariants are the specific and consensus riboswitches disclosed herein.Consensus riboswitches can be used to inform which part(s) of ariboswitch to vary for in vitro selection and evolution.

Also disclosed are modified riboswitches with altered regulation. Theregulation of a riboswitch can be altered by operably linking an aptamerdomain to the expression platform domain of the riboswitch (which is achimeric riboswitch). The aptamer domain can then mediate regulation ofthe riboswitch through the action of, for example, a trigger moleculefor the aptamer domain. Aptamer domains can be operably linked toexpression platform domains of riboswitches in any suitable manner,including, for example, by replacing the normal or natural aptamerdomain of the riboswitch with the new aptamer domain. Generally, anycompound or condition that can activate, deactivate or block theriboswitch from which the aptamer domain is derived can be used toactivate, deactivate or block the chimeric riboswitch.

Also disclosed are inactivated riboswitches. Riboswitches can beinactivated by covalently altering the riboswitch (by, for example,crosslinking parts of the riboswitch or coupling a compound to theriboswitch). Inactivation of a riboswitch in this manner can resultfrom, for example, an alteration that prevents the trigger molecule forthe riboswitch from binding, that prevents the change in state of theriboswitch upon binding of the trigger molecule, or that prevents theexpression platform domain of the riboswitch from affecting expressionupon binding of the trigger molecule.

Also disclosed are biosensor riboswitches. Biosensor riboswitches areengineered riboswitches that produce a detectable signal in the presenceof their cognate trigger molecule. Useful biosensor riboswitches can betriggered at or above threshold levels of the trigger molecules.Biosensor riboswitches can be designed for use in vivo or in vitro. Forexample, biosensor riboswitches operably linked to a reporter RNA thatencodes a protein that serves as or is involved in producing a signalcan be used in vivo by engineering a cell or organism to harbor anucleic acid construct encoding the riboswitch/reporter RNA. An exampleof a biosensor riboswitch for use in vitro is a riboswitch that includesa conformation dependent label, the signal from which changes dependingon the activation state of the riboswitch. Such a biosensor riboswitchpreferably uses an aptamer domain from or derived from a naturallyoccurring riboswitch. Biosensor riboswitches can be used in varioussituations and platforms. For example, biosensor riboswitches can beused with solid supports, such as plates, chips, strips and wells.

Also disclosed are modified or derivative riboswitches that recognizenew trigger molecules. New riboswitches and/or new aptamers thatrecognize new trigger molecules can be selected for, designed or derivedfrom known riboswitches. This can be accomplished by, for example,producing a set of aptamer variants in a riboswitch, assessing theactivation of the variant riboswitches in the presence of a compound ofinterest, selecting variant riboswitches that were activated (or, forexample, the riboswitches that were the most highly or the mostselectively activated), and repeating these steps until a variantriboswitch of a desired activity, specificity, combination of activityand specificity, or other combination of properties results.

Particularly useful aptamer domains can form a stem structure referredto herein as the P1 stem structure (or simply P1). The P1 stems of avariety of riboswitches are shown in FIG. 11 (and in other figures). Thehybridizing strands in the P1 stem structure are referred to as theaptamer strand (also referred to as the P1a strand) and the controlstrand (also referred to as the P1b strand). The control strand can forma stem structure with both the aptamer strand and a sequence in a linkedexpression platform that is referred to as the regulated strand (alsoreferred to as the P1c strand). Thus, the control strand (P1b) can formalternative stem structures with the aptamer strand (P1a) and theregulated strand (P1c). Activation and deactivation of a riboswitchresults in a shift from one of the stem structures to the other (fromP1a/P1b to P1b/P1c or vice versa). The formation of the P1b/P1c stemstructure affects expression of the RNA molecule containing theriboswitch. Riboswitches that operate via this control mechanism arereferred to herein as alternative stem structure riboswitches (or asalternative stem riboswitches).

In general, any aptamer domain can be adapted for use with anyexpression platform domain by designing or adapting a regulated strandin the expression platform domain to be complementary to the controlstrand of the aptamer domain. Alternatively, the sequence of the aptamerand control strands of an aptamer domain can be adapted so that thecontrol strand is complementary to a functionally significant sequencein an expression platform. For example, the control strand can beadapted to be complementary to the Shine-Dalgarno sequence of an RNAsuch that, upon formation of a stem structure between the control strandand the SD sequence, the SD sequence becomes inaccessible to ribosomes,thus reducing or preventing translation initiation. Note that theaptamer strand would have corresponding changes in sequence to allowformation of a P1 stem in the aptamer domain.

As another example, a transcription terminator can be added to an RNAmolecule (most conveniently in an untranslated region of the RNA) wherepart of the sequence of the transcription terminator is complementary tothe control strand of an aptamer domain (the sequence will be theregulated strand). This will allow the control sequence of the aptamerdomain to form alternative stem structures with the aptamer strand andthe regulated strand, thus either forming or disrupting a transcriptionterminator stem upon activation or deactivation of the riboswitch. Anyother expression element can be brought under the control of ariboswitch by similar design of alternative stem structures.

For transcription terminators controlled by riboswitches, the speed oftranscription and spacing of the riboswitch and expression platformelements can be important for proper control. Transcription speed can beadjusted by, for example, by including polymerase pausing elements(e.g., a series of uridine residues) to pause transcription and allowthe riboswitch to form and sense trigger molecules. For example, withthe FMN riboswitch, if FMN is bound to its aptamer domain, then theantiterminator sequence is sequestered and is unavailable for formationof an antiterminator structure (FIG. 12). However, if FMN is absent, theantiterminator can form once its nucleotides emerge from the polymerase.RNAP then breaks free of the pause site only to reach another U-stretchand pause again. The transcriptional terminator then forms only if theterminator nucleotides are not tied up by the antiterminator.

Disclosed are regulatable gene expression constructs comprising anucleic acid molecule encoding an RNA comprising a riboswitch operablylinked to a coding region, wherein the riboswitch regulates expressionof the RNA, wherein the riboswitch and coding region are heterologous.The riboswitch can comprise an aptamer domain and an expression platformdomain, wherein the aptamer domain and the expression platform domainare heterologous. The riboswitch can comprise an aptamer domain and anexpression platform domain, wherein the aptamer domain comprises a P1stem, wherein the P1 stem comprises an aptamer strand and a controlstrand, wherein the expression platform domain comprises a regulatedstrand, wherein the regulated strand, the control strand, or both havebeen designed to form a stem structure.

Disclosed are riboswitches, wherein the riboswitch is a non-naturalderivative of a naturally-occurring riboswitch. The riboswitch cancomprise an aptamer domain and an expression platform domain, whereinthe aptamer domain and the expression platform domain are heterologous.The riboswitch can be derived from a naturally-occurringguanine-responsive riboswitch, adenine-responsive riboswitch,lysine-responsive riboswitch, thiamine pyrophosphate-responsiveriboswitch, adenosylcobalamin-responsive riboswitch, flavinmononucleotide-responsive riboswitch, or aS-adenosylmethionine-responsive riboswitch. The riboswitch can beactivated by a trigger molecule, wherein the riboswitch produces asignal when activated by the trigger molecule.

Numerous riboswitches and riboswitch constructs are described andreferred to herein. It is specifically contemplated that any specificriboswitch or riboswitch construct or group of riboswitches orriboswitch constructs can be excluded from some aspects of the inventiondisclosed herein. For example, fusion of the xpt-pbuX riboswitch with areporter gene could be excluded from a set of riboswitches fused toreporter genes.

1. Aptamer Domains

Aptamers are nucleic acid segments and structures that can bindselectively to particular compounds and classes of compounds.Riboswitches have aptamer domains that, upon binding of a triggermolecule result in a change the state or structure of the riboswitch. Infunctional riboswitches, the state or structure of the expressionplatform domain linked to the aptamer domain changes when the triggermolecule binds to the aptamer domain. Aptamer domains of riboswitchescan be derived from any source, including, for example, natural aptamerdomains of riboswitches, artificial aptamers, engineered, selected,evolved or derived aptamers or aptamer domains. Aptamers in riboswitchesgenerally have at least one portion that can interact, such as byforming a stem structure, with a portion of the linked expressionplatform domain. This stem structure will either form or be disruptedupon binding of the trigger molecule.

Consensus aptamer domains of a variety of natural riboswitches are shownin FIG. 11. These aptamer domains (including all of the direct variantsembodied therein) can be used in riboswitches. The consensus sequencesand structures indicate variations in sequence and structure. Aptamerdomains that are within the indicated variations are referred to hereinas direct variants. These aptamer domains can be modified to producemodified or variant aptamer domains. Conservative modifications includeany change in base paired nucleotides such that the nucleotides in thepair remain complementary. Moderate modifications include changes in thelength of stems or of loops (for which a length or length range isindicated) of less than or equal to 20% of the length range indicated.Loop and stem lengths are considered to be “indicated” where theconsensus structure shows a stem or loop of a particular length or wherea range of lengths is listed or depicted. Moderate modifications includechanges in the length of stems or of loops (for which a length or lengthrange is not indicated) of less than or equal to 40% of the length rangeindicated. Moderate modifications also include and functional variantsof unspecified portions of the aptamer domain. Unspecified portions ofthe aptamer domains are indicated by solid lines in FIG. 11.

The P1 stem and its constituent strands can be modified in adaptingaptamer domains for use with expression platforms and RNA molecules.Such modifications, which can be extensive, are referred to herein as P1modifications. P1 modifications include changes to the sequence and/orlength of the P1 stem of an aptamer domain.

The aptamer domains shown in FIG. 11 (including any direct variants) areparticularly useful as initial sequences for producing derived aptamerdomains via in vitro selection or in vitro evolution techniques.

Aptamer domains of the disclosed riboswitches can also be used for anyother purpose, and in any other context, as aptamers. For example,aptamers can be used to control ribozymes, other molecular switches, andany RNA molecule where a change in structure can affect function of theRNA.

2. Expression Platform Domains

Expression platform domains are a part of riboswitches that affectexpression of the RNA molecule that contains the riboswitch. Expressionplatform domains generally have at least one portion that can interact,such as by forming a stem structure, with a portion of the linkedaptamer domain. This stem structure will either form or be disruptedupon binding of the trigger molecule. The stem structure generallyeither is, or prevents formation of, an expression regulatory structure.An expression regulatory structure is a structure that allows, prevents,enhances or inhibits expression of an RNA molecule containing thestructure. Examples include Shine-Dalgarno sequences, initiation codons,transcription terminators, and stability and processing signals.

B. Trigger Molecules

Trigger molecules are molecules and compounds that can activate ariboswitch. This includes the natural or normal trigger molecule for theriboswitch and other compounds that can activate the riboswitch. Naturalor normal trigger molecules are the trigger molecule for a givenriboswitch in nature or, in the case of some non-natural riboswitches,the trigger molecule for which the riboswitch was designed or with whichthe riboswitch was selected (as in, for example, in vitro selection orin vitro evolution techniques). Non-natural trigger molecules can bereferred to as non-natural trigger molecules.

C. Compounds

Also disclosed are compounds, and compositions containing suchcompounds, that can activate, deactivate or block a riboswitch.Riboswitches function to control gene expression through the binding orremoval of a trigger molecule. Compounds can be used to activate,deactivate or block a riboswitch. The trigger molecule for a riboswitch(as well as other activating compounds) can be used to activate ariboswitch. Compounds other than the trigger molecule generally can beused to deactivate or block a riboswitch. Riboswitches can also bedeactivated by, for example, removing trigger molecules from thepresence of the riboswitch. A riboswitch can be blocked by, for example,binding of an analog of the trigger molecule that does not activate theriboswitch.

Also disclosed are compounds for altering expression of an RNA molecule,or of a gene encoding an RNA molecule, where the RNA molecule includes ariboswitch. This can be accomplished by bringing a compound into contactwith the RNA molecule. Riboswitches function to control gene expressionthrough the binding or removal of a trigger molecule. Thus, subjectingan RNA molecule of interest that includes a riboswitch to conditionsthat activate, deactivate or block the riboswitch can be used to alterexpression of the RNA. Expression can be altered as a result of, forexample, termination of transcription or blocking of ribosome binding tothe RNA. Binding of a trigger molecule can, depending on the nature ofthe riboswitch, reduce or prevent expression of the RNA molecule orpromote or increase expression of the RNA molecule.

Also disclosed are compounds for regulating expression of an RNAmolecule, or of a gene encoding an RNA molecule. Also disclosed arecompounds for regulating expression of a naturally occurring gene or RNAthat contains a riboswitch by activating, deactivating or blocking theriboswitch. If the gene is essential for survival of a cell or organismthat harbors it, activating, deactivating or blocking the riboswitch canin death, stasis or debilitation of the cell or organism.

Also disclosed are compounds for regulating expression of an isolated,engineered or recombinant gene or RNA that contains a riboswitch byactivating, deactivating or blocking the riboswitch. If the gene encodesa desired expression product, activating or deactivating the riboswitchcan be used to induce expression of the gene and thus result inproduction of the expression product. If the gene encodes an inducer orrepressor of gene expression or of another cellular process, activation,deactivation or blocking of the riboswitch can result in induction,repression, or de-repression of other, regulated genes or cellularprocesses. Many such secondary regulatory effects are known and can beadapted for use with riboswitches. An advantage of riboswitches as theprimary control for such regulation is that riboswitch trigger moleculescan be small, non-antigenic molecules.

Also disclosed are methods of identifying compounds that activate,deactivate or block a riboswitch. For examples, compounds that activatea riboswitch can be identified by bringing into contact a test compoundand a riboswitch and assessing activation of the riboswitch. If theriboswitch is activated, the test compound is identified as a compoundthat activates the riboswitch. Activation of a riboswitch can beassessed in any suitable manner. For example, the riboswitch can belinked to a reporter RNA and expression, expression level, or change inexpression level of the reporter RNA can be measured in the presence andabsence of the test compound. As another example, the riboswitch caninclude a conformation dependent label, the signal from which changesdepending on the activation state of the riboswitch. Such a riboswitchpreferably uses an aptamer domain from or derived from a naturallyoccurring riboswitch. As can be seen, assessment of activation of ariboswitch can be performed with the use of a control assay ormeasurement or without the use of a control assay or measurement.Methods for identifying compounds that deactivate a riboswitch can beperformed in analogous ways.

Identification of compounds that block a riboswitch can be accomplishedin any suitable manner. For example, an assay can be performed forassessing activation or deactivation of a riboswitch in the presence ofa compound known to activate or deactivate the riboswitch and in thepresence of a test compound. If activation or deactivation is notobserved as would be observed in the absence of the test compound, thenthe test compound is identified as a compound that blocks activation ordeactivation of the riboswitch.

Also disclosed are compounds made by identifying a compound thatactivates, deactivates or blocks a riboswitch and manufacturing theidentified compound. This can be accomplished by, for example, combiningcompound identification methods as disclosed elsewhere herein withmethods for manufacturing the identified compounds. For example,compounds can be made by bringing into contact a test compound and ariboswitch, assessing activation of the riboswitch, and, if theriboswitch is activated by the test compound, manufacturing the testcompound that activates the riboswitch as the compound.

Also disclosed are compounds made by checking activation, deactivationor blocking of a riboswitch by a compound and manufacturing the checkedcompound. This can be accomplished by, for example, combining compoundactivation, deactivation or blocking assessment methods as disclosedelsewhere herein with methods for manufacturing the checked compounds.For example, compounds can be made by bringing into contact a testcompound and a riboswitch, assessing activation of the riboswitch, and,if the riboswitch is activated by the test compound, manufacturing thetest compound that activates the riboswitch as the compound. Checkingcompounds for their ability to activate, deactivate or block ariboswitch refers to both identification of compounds previously unknownto activate, deactivate or block a riboswitch and to assessing theability of a compound to activate, deactivate or block a riboswitchwhere the compound was already known to activate, deactivate or blockthe riboswitch.

Specific compounds that can be used to activate riboswitches are alsodisclosed. Compounds useful with guanine-responsive riboswitches (andriboswitches derived from guanine-responsive riboswitches) includecompounds having the formula

where the compound can bind a guanine-responsive riboswitch orderivative thereof, where, when the compound is bound to aguanine-responsive riboswitch or derivative, R₇ serves as a hydrogenbond acceptor, R₁₀ serves as a hydrogen bond donor, R₁₁ serves as ahydrogen bond acceptor, R₁₂ serves as a hydrogen bond donor, where R₁₃is H, H₂ or is not present, where R₁, R₂, R₃, R₄, R₅, R₆, R₈, and R₉ areeach independently C, N, O, or S, and where

each independently represent a single or double bond.

Every compound within the above definition is intended to be and shouldbe considered to be specifically disclosed herein. Further, everysubgroup that can be identified within the above definition is intendedto be and should be considered to be specifically disclosed herein. As aresult, it is specifically contemplated that any compound, or subgroupof compounds can be either specifically included for or excluded fromuse or included in or excluded from a list of compounds. For example, asone option, a group of compounds is contemplated where each compound isas defined above but is not guanine, hypoxanthine, xanthine, orN²-methylguanine. As another example, a group of compounds iscontemplated where each compound is as defined above and is able toactivate a guanine-responsive riboswitch.

Compounds useful with adenine-responsive riboswitches (and riboswitchesderived from adenine-responsive riboswitches) include compounds havingthe formula

where the compound can bind an adenine-responsive riboswitch orderivative thereof, where, when the compound is bound to anadenine-responsive riboswitch or derivative, R₁, R₃ and R₇ serve ashydrogen bond acceptors, and R₁₀ and R₁₁ serve as hydrogen bond donors,where R₁₂ is H, H₂ or is not present, where R₁, R₂, R₃, R₄, R₅, R₆, R₈,and R₉ are each independently C, N, O, or S, and where

each independently represent a single or double bond.

Every compound within the above definition is intended to be and shouldbe considered to be specifically disclosed herein. Further, everysubgroup that can be identified within the above definition is intendedto be and should be considered to be specifically disclosed herein. As aresult, it is specifically contemplated that any compound, or subgroupof compounds can be either specifically included for or excluded fromuse or included in or excluded from a list of compounds. For example, asone option, a group of compounds is contemplated where each compound isas defined above but is not adenine, 2,6-diaminopurine, or 2-aminopurine. As another example, a group of compounds is contemplated whereeach compound is as defined above and is able to activate anadenine-responsive riboswitch.

Compounds useful with lysine-responsive riboswitches (and riboswitchesderived from lysine-responsive riboswitches) include compounds havingthe formula

where the compound can bind a lysine-responsive riboswitch or derivativethereof, where R₂ and R₃ are each positively charged, where R₁ isnegatively charged, where R₄ is C, N, O, or S, and where

each independently represent a single or double bond. Also contemplatedare compounds as defined above where R₂ and R₃ are each NH₃ ⁺ and whereR₁ is O⁻.

Every compound within the above definition is intended to be and shouldbe considered to be specifically disclosed herein. Further, everysubgroup that can be identified within the above definition is intendedto be and should be considered to be specifically disclosed herein. As aresult, it is specifically contemplated that any compound, or subgroupof compounds can be either specifically included for or excluded fromuse or included in or excluded from a list of compounds. For example, asone option, a group of compounds is contemplated where each compound isas defined above but is not lysine. As another example, a group ofcompounds is contemplated where each compound is as defined above and isable to activate a lysine-responsive riboswitch.

Compounds useful with TPP-responsive riboswitches (and riboswitchesderived from lysine-responsive riboswitches) include compounds havingthe formula

where the compound can bind a TPP-responsive riboswitch or derivativethereof, where R₁ is positively charged, where R₂ and R₃ are eachindependently C, O, or S, where R₄ is CH₃, NH₂, OH, SH, H or notpresent, where R₅ is CH₃, NH₂, OH, SH, or H, where R₆ is C or N, andwhere

each independently represent a single or double bond. Also contemplatedare compounds as defined above where R₁ is phosphate, diphosphate ortriphosphate.

Every compound within the above definition is intended to be and shouldbe considered to be specifically disclosed herein. Further, everysubgroup that can be identified within the above definition is intendedto be and should be considered to be specifically disclosed herein. As aresult, it is specifically contemplated that any compound, or subgroupof compounds can be either specifically included for or excluded fromuse or included in or excluded from a list of compounds. For example, asone option, a group of compounds is contemplated where each compound isas defined above but is not TPP, TP or thiamine. As another example, agroup of compounds is contemplated where each compound is as definedabove and is able to activate a TPP-responsive riboswitch.

D. Constructs, Vectors and Expression Systems

The disclosed riboswitches can be used in with any suitable expressionsystem. Recombinant expression is usefully accomplished using a vector,such as a plasmid. The vector can include a promoter operably linked toriboswitch-encoding sequence and RNA to be expression (e.g., RNAencoding a protein). The vector can also include other elements requiredfor transcription and translation. As used herein, vector refers to anycarrier containing exogenous DNA. Thus, vectors are agents thattransport the exogenous nucleic acid into a cell without degradation andinclude a promoter yielding expression of the nucleic acid in the cellsinto which it is delivered. Vectors include but are not limited toplasmids, viral nucleic acids, viruses, phage nucleic acids, phages,cosmids, and artificial chromosomes. A variety of prokaryotic andeukaryotic expression vectors suitable for carrying riboswitch-regulatedconstructs can be produced. Such expression vectors include, forexample, pET, pET3d, pCR2.1, pBAD, pUC, and yeast vectors. The vectorscan be used, for example, in a variety of in vivo and in vitrosituation.

Viral vectors include adenovirus, adeno-associated virus, herpes virus,vaccinia virus, polio virus, AIDS virus, neuronal trophic virus, Sindbisand other RNA viruses, including these viruses with the HIV backbone.Also useful are any viral families which share the properties of theseviruses which make them suitable for use as vectors. Retroviral vectors,which are described in Verma (1985), include Murine Maloney Leukemiavirus, MMLV, and retroviruses that express the desirable properties ofMMLV as a vector. Typically, viral vectors contain, nonstructural earlygenes, structural late genes, an RNA polymerase III transcript, invertedterminal repeats necessary for replication and encapsidation, andpromoters to control the transcription and replication of the viralgenome. When engineered as vectors, viruses typically have one or moreof the early genes removed and a gene or gene/promotor cassette isinserted into the viral genome in place of the removed viral DNA.

A “promoter” is generally a sequence or sequences of DNA that functionwhen in a relatively fixed location in regard to the transcription startsite. A “promoter” contains core elements required for basic interactionof RNA polymerase and transcription factors and can contain upstreamelements and response elements.

“Enhancer” generally refers to a sequence of DNA that functions at nofixed distance from the transcription start site and can be either 5′(Laimins, 1981) or 3′ (Lusky et al., 1983) to the transcription unit.Furthermore, enhancers can be within an intron (Banerji et al., 1983) aswell as within the coding sequence itself (Osborne et al., 1984). Theyare usually between 10 and 300 bp in length, and they function in cis.Enhancers function to increase transcription from nearby promoters.Enhancers, like promoters, also often contain response elements thatmediate the regulation of transcription. Enhancers often determine theregulation of expression.

Expression vectors used in eukaryotic host cells (yeast, fungi, insect,plant, animal, human or nucleated cells) can also contain sequencesnecessary for the termination of transcription which can affect mRNAexpression. These regions are transcribed as polyadenylated segments inthe untranslated portion of the mRNA encoding tissue factor protein. The3′ untranslated regions also include transcription termination sites. Itis preferred that the transcription unit also contain a polyadenylationregion. One benefit of this region is that it increases the likelihoodthat the transcribed unit will be processed and transported like mRNA.The identification and use of polyadenylation signals in expressionconstructs is well established. It is preferred that homologouspolyadenylation signals be used in the transgene constructs.

The vector can include nucleic acid sequence encoding a marker product.This marker product is used to determine if the gene has been deliveredto the cell and once delivered is being expressed. Preferred markergenes are the E. Coli lacZ gene which encodes β-galactosidase and greenfluorescent protein.

In some embodiments the marker can be a selectable marker. When suchselectable markers are successfully transferred into a host cell, thetransformed host cell can survive if placed under selective pressure.There are two widely used distinct categories of selective regimes. Thefirst category is based on a cell's metabolism and the use of a mutantcell line which lacks the ability to grow independent of a supplementedmedia. The second category is dominant selection which refers to aselection scheme used in any cell type and does not require the use of amutant cell line. These schemes typically use a drug to arrest growth ofa host cell. Those cells which have a novel gene would express a proteinconveying drug resistance and would survive the selection. Examples ofsuch dominant selection use the drugs neomycin, (Southern and Berg,1982), mycophenolic acid, (Mulligan and Berg, 1980) or hygromycin(Sugden et al., 1985).

Gene transfer can be obtained using direct transfer of genetic material,in but not limited to, plasmids, viral vectors, viral nucleic acids,phage nucleic acids, phages, cosmids, and artificial chromosomes, or viatransfer of genetic material in cells or carriers such as cationicliposomes. Such methods are well known in the art and readily adaptablefor use in the method described herein. Transfer vectors can be anynucleotide construction used to deliver genes into cells (e.g., aplasmid), or as part of a general strategy to deliver genes, e.g., aspart of recombinant retrovirus or adenovirus (Ram et al. Cancer Res.53:83-88, (1993)). Appropriate means for transfection, including viralvectors, chemical transfectants, or physico-mechanical methods such aselectroporation and direct diffusion of DNA, are described by, forexample, Wolff, J. A., et al., Science, 247, 1465-1468, (1990); andWolff, J. A. Nature, 352, 815-818, (1991).

1. Viral Vectors

Preferred viral vectors are Adenovirus, Adeno-associated virus, Herpesvirus, Vaccinia virus, Polio virus, AIDS virus, neuronal trophic virus,Sindbis and other RNA viruses, including these viruses with the HIVbackbone. Also preferred are any viral families which share theproperties of these viruses which make them suitable for use as vectors.Preferred retroviruses include Murine Maloney Leukemia virus, MMLV, andretroviruses that express the desirable properties of MMLV as a vector.Retroviral vectors are able to carry a larger genetic payload, i.e., atransgene or marker gene, than other viral vectors, and for this reasonare a commonly used vector. However, they are not useful innon-proliferating cells. Adenovirus vectors are relatively stable andeasy to work with, have high titers, and can be delivered in aerosolformulation, and can transfect non-dividing cells. Pox viral vectors arelarge and have several sites for inserting genes, they are thermostableand can be stored at room temperature. A preferred embodiment is a viralvector which has been engineered so as to suppress the immune responseof the host organism, elicited by the viral antigens. Preferred vectorsof this type will carry coding regions for Interleukin 8 or 10.

Viral vectors have higher transaction (ability to introduce genes)abilities than do most chemical or physical methods to introduce genesinto cells. Typically, viral vectors contain, nonstructural early genes,structural late genes, an RNA polymerase III transcript, invertedterminal repeats necessary for replication and encapsidation, andpromoters to control the transcription and replication of the viralgenome. When engineered as vectors, viruses typically have one or moreof the early genes removed and a gene or gene/promotor cassette isinserted into the viral genome in place of the removed viral DNA.Constructs of this type can carry up to about 8 kb of foreign geneticmaterial. The necessary functions of the removed early genes aretypically supplied by cell lines which have been engineered to expressthe gene products of the early genes in trans.

i. Retroviral Vectors

A retrovirus is an animal virus belonging to the virus family ofRetroviridae, including any types, subfamilies, genus, or tropisms.Retroviral vectors, in general, are described by Verma, I. M.,Retroviral vectors for gene transfer. In Microbiology-1985, AmericanSociety for Microbiology, pp. 229-232, Washington, (1985), which isincorporated by reference herein. Examples of methods for usingretroviral vectors for gene therapy are described in U.S. Pat. Nos.4,868,116 and 4,980,286; PCT applications WO 90/02806 and WO 89/07136;and Mulligan, (Science 260:926-932 (1993)); the teachings of which areincorporated herein by reference.

A retrovirus is essentially a package which has packed into it nucleicacid cargo. The nucleic acid cargo carries with it a packaging signal,which ensures that the replicated daughter molecules will be efficientlypackaged within the package coat. In addition to the package signal,there are a number of molecules which are needed in cis, for thereplication, and packaging of the replicated virus. Typically aretroviral genome, contains the gag, pol, and env genes which areinvolved in the making of the protein coat. It is the gag, pol, and envgenes which are typically replaced by the foreign DNA that it is to betransferred to the target cell. Retrovirus vectors typically contain apackaging signal for incorporation into the package coat, a sequencewhich signals the start of the gag transcription unit, elementsnecessary for reverse transcription, including a primer binding site tobind the tRNA primer of reverse transcription, terminal repeat sequencesthat guide the switch of RNA strands during DNA synthesis, a purine richsequence 5′ to the 3′ LTR that serve as the priming site for thesynthesis of the second strand of DNA synthesis, and specific sequencesnear the ends of the LTRs that enable the insertion of the DNA state ofthe retrovirus to insert into the host genome. The removal of the gag,pol, and env genes allows for about 8 kb of foreign sequence to beinserted into the viral genome, become reverse transcribed, and uponreplication be packaged into a new retroviral particle. This amount ofnucleic acid is sufficient for the delivery of a one to many genesdepending on the size of each transcript. It is preferable to includeeither positive or negative selectable markers along with other genes inthe insert.

Since the replication machinery and packaging proteins in mostretroviral vectors have been removed (gag, pol, and env), the vectorsare typically generated by placing them into a packaging cell line. Apackaging cell line is a cell line which has been transfected ortransformed with a retrovirus that contains the replication andpackaging machinery, but lacks any packaging signal. When the vectorcarrying the DNA of choice is transfected into these cell lines, thevector containing the gene of interest is replicated and packaged intonew retroviral particles, by the machinery provided in cis by the helpercell. The genomes for the machinery are not packaged because they lackthe necessary signals.

ii. Adenoviral Vectors

The construction of replication-defective adenoviruses has beendescribed (Berkner et al., J. Virology 61:1213-1220 (1987); Massie etal., Mol. Cell. Biol. 6:2872-2883 (1986); Haj-Ahmad et al., J. Virology57:267-274 (1986); Davidson et al., J. Virology 61:1226-1239 (1987);Zhang “Generation and identification of recombinant adenovirus byliposome-mediated transfection and PCR analysis” BioTechniques15:868-872 (1993)). The benefit of the use of these viruses as vectorsis that they are limited in the extent to which they can spread to othercell types, since they can replicate within an initial infected cell,but are unable to form new infectious viral particles. Recombinantadenoviruses have been shown to achieve high efficiency gene transferafter direct, in vivo delivery to airway epithelium, hepatocytes,vascular endothelium, CNS parenchyma and a number of other tissue sites(Morsy, J. Clin. Invest. 92:1580-1586 (1993); Kirshenbaum, J. Clin.Invest. 92:381-387 (1993); Roessler, J. Clin. Invest. 92:1085-1092(1993); Moullier, Nature Genetics 4:154-159 (1993); La Salle, Science259:988-990 (1993); Gomez-Foix, J. Biol. Chem. 267:25129-25134 (1992);Rich, Human Gene Therapy 4:461-476 (1993); Zabner, Nature Genetics6:75-83 (1994); Guzman, Circulation Research 73:1201-1207 (1993); Bout,Human Gene Therapy 5:3-10 (1994); Zabner, Cell 75:207-216 (1993);Caillaud, Eur. J. Neuroscience 5:1287-1291 (1993); and Ragot, J. Gen.Virology 74:501-507 (1993)). Recombinant adenoviruses achieve genetransduction by binding to specific cell surface receptors, after whichthe virus is internalized by receptor-mediated endocytosis, in the samemanner as wild type or replication-defective adenovirus (Chardonnet andDales, Virology 40:462-477 (1970); Brown and Burlingham, J. Virology12:386-396 (1973); Svensson and Persson, J. Virology 55:442-449 (1985);Seth, et al., J. Virol. 51:650-655 (1984); Seth, et al., Mol. Cell.Biol. 4:1528-1533 (1984); Varga et al., J. Virology 65:6061-6070 (1991);Wickham et al., Cell 73:309-319 (1993)).

A preferred viral vector is one based on an adenovirus which has had theE1 gene removed and these virons are generated in a cell line such asthe human 293 cell line. In another preferred embodiment both the E1 andE3 genes are removed from the adenovirus genome.

Another type of viral vector is based on an adeno-associated virus(AAV). This defective parvovirus is a preferred vector because it caninfect many cell types and is nonpathogenic to humans. AAV type vectorscan transport about 4 to 5 kb and wild type AAV is known to stablyinsert into chromosome 19. Vectors which contain this site specificintegration property are preferred. An especially preferred embodimentof this type of vector is the P4.1 C vector produced by Avigen, SanFrancisco, Calif., which can contain the herpes simplex virus thymidinekinase gene, HSV-tk, and/or a marker gene, such as the gene encoding thegreen fluorescent protein, GFP.

The inserted genes in viral and retroviral usually contain promoters,and/or enhancers to help control the expression of the desired geneproduct. A promoter is generally a sequence or sequences of DNA thatfunction when in a relatively fixed location in regard to thetranscription start site. A promoter contains core elements required forbasic interaction of RNA polymerase and transcription factors, and cancontain upstream elements and response elements.

2. Viral Promoters and Enhancers

Preferred promoters controlling transcription from vectors in mammalianhost cells can be obtained from various sources, for example, thegenomes of viruses such as: polyoma, Simian Virus 40 (SV40), adenovirus,retroviruses, hepatitis-B virus and most preferably cytomegalovirus, orfrom heterologous mammalian promoters, e.g. beta actin promoter. Theearly and late promoters of the SV40 virus are conveniently obtained asan SV40 restriction fragment which also contains the SV40 viral originof replication (Fiers et al., Nature, 273: 113 (1978)). The immediateearly promoter of the human cytomegalovirus is conveniently obtained asa HindIII E restriction fragment (Greenway, P. J. et al., Gene 18:355-360 (1982)). Of course, promoters from the host cell or relatedspecies also are useful herein.

Enhancer generally refers to a sequence of DNA that functions at nofixed distance from the transcription start site and can be either 5′(Laimins, L. et al., Proc. Natl. Acad. Sci. 78: 993 (1981)) or 3′(Lusky, M. L., et al., Mol. Cell. Bio. 3: 1108 (1983)) to thetranscription unit. Furthermore, enhancers can be within an intron(Banerji, J. L. et al., Cell 33: 729 (1983)) as well as within thecoding sequence itself (Osborne, T. F., et al., Mol. Cell. Bio. 4: 1293(1984)). They are usually between 10 and 300 bp in length, and theyfunction in cis. Enhancers function to increase transcription fromnearby promoters. Enhancers also often contain response elements thatmediate the regulation of transcription. Promoters can also containresponse elements that mediate the regulation of transcription.Enhancers often determine the regulation of expression of a gene. Whilemany enhancer sequences are now known from mammalian genes (globin,elastase, albumin, α-fetoprotein and insulin), typically one will use anenhancer from a eukaryotic cell virus. Preferred examples are the SV40enhancer on the late side of the replication origin (bp 100-270), thecytomegalovirus early promoter enhancer, the polyoma enhancer on thelate side of the replication origin, and adenovirus enhancers.

The promotor and/or enhancer can be specifically activated either bylight or specific chemical events which trigger their function. Systemscan be regulated by reagents such as tetracycline and dexamethasone.There are also ways to enhance viral vector gene expression by exposureto irradiation, such as gamma irradiation, or alkylating chemotherapydrugs.

It is preferred that the promoter and/or enhancer region be active inall eukaryotic cell types. A preferred promoter of this type is the CMVpromoter (650 bases). Other preferred promoters are SV40 promoters,cytomegalovirus (full length promoter), and retroviral vector LTF.

It has been shown that all specific regulatory elements can be clonedand used to construct expression vectors that are selectively expressedin specific cell types such as melanoma cells. The glial fibrillaryacetic protein (GFAP) promoter has been used to selectively expressgenes in cells of glial origin.

Expression vectors used in eukaryotic host cells (yeast, fungi, insect,plant, animal, human or nucleated cells) can also contain sequencesnecessary for the termination of transcription which can affect mRNAexpression. These regions are transcribed as polyadenylated segments inthe untranslated portion of the mRNA encoding tissue factor protein. The3′ untranslated regions also include transcription termination sites. Itis preferred that the transcription unit also contain a polyadenylationregion. One benefit of this region is that it increases the likelihoodthat the transcribed unit will be processed and transported like mRNA.The identification and use of polyadenylation signals in expressionconstructs is well established. It is preferred that homologouspolyadenylation signals be used in the transgene constructs. In apreferred embodiment of the transcription unit, the polyadenylationregion is derived from the SV40 early polyadenylation signal andconsists of about 400 bases. It is also preferred that the transcribedunits contain other standard sequences alone or in combination with theabove sequences improve expression from, or stability of, the construct.

3. Markers

The vectors can include nucleic acid sequence encoding a marker product.This marker product is used to determine if the gene has been deliveredto the cell and once delivered is being expressed. Preferred markergenes are the E. Coli lacZ gene which encodes β-galactosidase and greenfluorescent protein.

In some embodiments the marker can be a selectable marker. Examples ofsuitable selectable markers for mammalian cells are dihydrofolatereductase (DHFR), thymidine kinase, neomycin, neomycin analog G418,hydromycin, and puromycin. When such selectable markers are successfullytransferred into a mammalian host cell, the transformed mammalian hostcell can survive if placed under selective pressure. There are twowidely used distinct categories of selective regimes. The first categoryis based on a cell's metabolism and the use of a mutant cell line whichlacks the ability to grow independent of a supplemented media. Twoexamples are: CHO DHFR⁻ cells and mouse LTK⁻ cells. These cells lack theability to grow without the addition of such nutrients as thymidine orhypoxanthine. Because these cells lack certain genes necessary for acomplete nucleotide synthesis pathway, they cannot survive unless themissing nucleotides are provided in a supplemented media. An alternativeto supplementing the media is to introduce an intact DHFR or TK geneinto cells lacking the respective genes, thus altering their growthrequirements. Individual cells which were not transformed with the DHFRor TK gene will not be capable of survival in non-supplemented media.

The second category is dominant selection which refers to a selectionscheme used in any cell type and does not require the use of a mutantcell line. These schemes typically use a drug to arrest growth of a hostcell. Those cells which would express a protein conveying drugresistance and would survive the selection. Examples of such dominantselection use the drugs neomycin, (Southern P. and Berg, P., J. Molec.Appl. Genet. 1: 327 (1982)), mycophenolic acid, (Mulligan, R. C. andBerg, P. Science 209: 1422 (1980)) or hygromycin, (Sugden, B. et al.,Mol. Cell. Biol. 5: 410-413 (1985)). The three examples employ bacterialgenes under eukaryotic control to convey resistance to the appropriatedrug G418 or neomycin (geneticin), xgpt (mycophenolic acid) orhygromycin, respectively. Others include the neomycin analog G418 andpuramycin.

E. Biosensor Riboswitches

Also disclosed are biosensor riboswitches. Biosensor riboswitches areengineered riboswitches that produce a detectable signal in the presenceof their cognate trigger molecule. Useful biosensor riboswitches can betriggered at or above threshold levels of the trigger molecules.Biosensor riboswitches can be designed for use in vivo or in vitro. Forexample, biosensor riboswitches operably linked to a reporter RNA thatencodes a protein that serves as or is involved in producing a signalcan be used in vivo by engineering a cell or organism to harbor anucleic acid construct encoding the riboswitch/reporter RNA. An exampleof a biosensor riboswitch for use in vitro is a riboswitch that includesa conformation dependent label, the signal from which changes dependingon the activation state of the riboswitch. Such a biosensor riboswitchpreferably uses an aptamer domain from or derived from a naturallyoccurring riboswitch.

F. Reporter Proteins and Peptides

For assessing activation of a riboswitch, or for biosensor riboswitches,a reporter protein or peptide can be used. The reporter protein orpeptide can be encoded by the RNA the expression of which is regulatedby the riboswitch. The examples describe the use of some specificreporter proteins. The use of reporter proteins and peptides is wellknown and can be adapted easily for use with riboswitches. The reporterproteins can be any protein or peptide that can be detected or thatproduces a detectable signal. Preferably, the presence of the protein orpeptide can be detected using standard techniques (e.g.,radioimmunoassay, radio-labeling, immunoassay, assay for enzymaticactivity, absorbance, fluorescence, luminescence, and Western blot).More preferably, the level of the reporter protein is easilyquantifiable using standard techniques even at low levels. Usefulreporter proteins include luciferases, green fluorescent proteins andtheir derivatives, such as firefly luciferase (FL) from Photinuspyralis, and Renilla luciferase (RL) from Renilla reniformis.

G. Conformation Dependent Labels

Conformation dependent labels refer to all labels that produce a changein fluorescence intensity or wavelength based on a change in the form orconformation of the molecule or compound (such as a riboswitch) withwhich the label is associated. Examples of conformation dependent labelsused in the context of probes and primers include molecular beacons,Amplifluors, FRET probes, cleavable FRET probes, TaqMan probes, scorpionprimers, fluorescent triplex oligos including but not limited to triplexmolecular beacons or triplex FRET probes, fluorescent water-solubleconjugated polymers, PNA probes and QPNA probes. Such labels, and, inparticular, the principles of their function, can be adapted for usewith riboswitches. Several types of conformation dependent labels arereviewed in Schweitzer and Kingsmore, Curr. Opin. Biotech. 12:21-27(2001).

Stem quenched labels, a form of conformation dependent labels, arefluorescent labels positioned on a nucleic acid such that when a stemstructure forms a quenching moiety is brought into proximity such thatfluorescence from the label is quenched. When the stem is disrupted(such as when a riboswitch containing the label is activated), thequenching moiety is no longer in proximity to the fluorescent label andfluorescence increases. Examples of this effect can be found inmolecular beacons, fluorescent triplex oligos, triplex molecularbeacons, triplex FRET probes, and QPNA probes, the operationalprinciples of which can be adapted for use with riboswitches.

Stem activated labels, a form of conformation dependent labels, arelabels or pairs of labels where fluorescence is increased or altered byformation of a stem structure. Stem activated labels can include anacceptor fluorescent label and a donor moiety such that, when theacceptor and donor are in proximity (when the nucleic acid strandscontaining the labels form a stem structure), fluorescence resonanceenergy transfer from the donor to the acceptor causes the acceptor tofluoresce. Stem activated labels are typically pairs of labelspositioned on nucleic acid molecules (such as riboswitches) such thatthe acceptor and donor are brought into proximity when a stem structureis formed in the nucleic acid molecule. If the donor moiety of a stemactivated label is itself a fluorescent label, it can release energy asfluorescence (typically at a different wavelength than the fluorescenceof the acceptor) when not in proximity to an acceptor (that is, when astem structure is not formed). When the stem structure forms, theoverall effect would then be a reduction of donor fluorescence and anincrease in acceptor fluorescence. FRET probes are an example of the useof stem activated labels, the operational principles of which can beadapted for use with riboswitches.

H. Detection Labels

To aid in detection and quantitation of riboswitch activation,deactivation or blocking, or expression of nucleic acids or proteinproduced upon activation, deactivation or blocking of riboswitches,detection labels can be incorporated into detection probes or detectionmolecules or directly incorporated into expressed nucleic acids orproteins. As used herein, a detection label is any molecule that can beassociated with nucleic acid or protein, directly or indirectly, andwhich results in a measurable, detectable signal, either directly orindirectly. Many such labels are known to those of skill in the art.Examples of detection labels suitable for use in the disclosed methodare radioactive isotopes, fluorescent molecules, phosphorescentmolecules, enzymes, antibodies, and ligands.

Examples of suitable fluorescent labels include fluoresceinisothiocyanate (FITC), 5,6-carboxymethyl fluorescein, Texas red,nitrobenz-2-oxa-1,3-diazol-4-yl (NBD), coumarin, dansyl chloride,rhodamine, amino-methyl coumarin (AMCA), Eosin, Erythrosin, BODIPY®,Cascade Blue®, Oregon Green®, pyrene, lissamine, xanthenes, acridines,oxazines, phycoerythrin, macrocyclic chelates of lanthanide ions such asquantum Dye™, fluorescent energy transfer dyes, such as thiazoleorange-ethidium heterodimer, and the cyanine dyes Cy3, Cy3.5, Cy5, Cy5.5and Cy7. Examples of other specific fluorescent labels include3-Hydroxypyrene 5,8,10-Tri Sulfonic acid, 5-Hydroxy Tryptamine (5-HT),Acid Fuchsin, Alizarin Complexon, Alizarin Red, Allophycocyanin,Aminocoumarin, Anthroyl Stearate, Astrazon Brilliant Red 4G, AstrazonOrange R, Astrazon Red 6B, Astrazon Yellow 7 GLL, Atabrine, Auramine,Aurophosphine, Aurophosphine G, BAO 9 (Bisaminophenyloxadiazole), BCECF,Berberine Sulphate, Bisbenzamide, Blancophor FFG Solution, BlancophorSV, Bodipy F1, Brilliant Sulphoflavin FF, Calcien Blue, Calcium Green,Calcofluor RW Solution, Calcofluor White, Calcophor White ABT Solution,Calcophor White Standard Solution, Carbostyryl, Cascade Yellow,Catecholamine, Chinacrine, Coriphosphine O, Coumarin-Phalloidin, CY3.18, CY5.1 8, CY7, Dans (1-Dimethyl Amino Naphaline 5 Sulphonic Acid),Dansa (Diamino Naphtyl Sulphonic Acid), Dansyl NH-CH3, Diamino PhenylOxydiazole (DAO), Dimethylamino-5-Sulphonic acid, DipyrrometheneboronDifluoride, Diphenyl Brilliant Flavine 7GFF, Dopamine, Erythrosin ITC,Euchrysin, FIF (Formaldehyde Induced Fluorescence), Flazo Orange, Fluo3, Fluorescamine, Fura-2, Genacryl Brilliant Red B, Genacryl BrilliantYellow 10GF, Genacryl Pink 3G, Genacryl Yellow 5GF, Gloxalic Acid,Granular Blue, Haematoporphyrin, Indo-1, Intrawhite Cf Liquid, LeucophorPAF, Leucophor SF, Leucophor WS, Lissamine Rhodamine B200 (RD200),Lucifer Yellow CH, Lucifer Yellow VS, Magdala Red, Marina Blue, MaxilonBrilliant Flavin 10 GFF, Maxilon Brilliant Flavin 8 GFF, MPS (MethylGreen Pyronine Stilbene), Mithramycin, NBD Amine, Nitrobenzoxadidole,Noradrenaline, Nuclear Fast Red, Nuclear Yellow, Nylosan BrilliantFlavin EBG, Oxadiazole, Pacific Blue, Pararosaniline (Feulgen), PhorwiteAR Solution, Phorwite BKL, Phorwite Rev, Phorwite RPA, Phosphine 3R,Phthalocyanine, Phycoerythrin R, Polyazaindacene Pontochrome Blue Black,Porphyrin, Primuline, Procion Yellow, Pyronine, Pyronine B, PyrozalBrilliant Flavin 7GF, Quinacrine Mustard, Rhodamine 123, Rhodamine 5GLD, Rhodamine 6G, Rhodamine B, Rhodamine B 200, Rhodamine B Extra,Rhodamine BB, Rhodamine BG, Rhodamine WT, Serotonin, Sevron BrilliantRed 2B, Sevron Brilliant Red 4G, Sevron Brilliant Red B, Sevron Orange,Sevron Yellow L, SITS (Primuline), SITS (Stilbene Isothiosulphonicacid), Stilbene, Snarf 1, sulpho Rhodamine B Can C, Sulpho Rhodamine GExtra, Tetracycline, Thiazine Red R, Thioflavin S, Thioflavin TCN,Thioflavin 5, Thiolyte, Thiozol Orange, Tinopol CBS, True Blue,Ultralite, Uranine B, Uvitex SFC, Xylene Orange, and XRITC.

Useful fluorescent labels are fluorescein(5-carboxyfluorescein-N-hydroxysuccinimide ester), rhodamine(5,6-tetramethyl rhodamine), and the cyanine dyes Cy3, Cy3.5, Cy5, Cy5.5and Cy7. The absorption and emission maxima, respectively, for thesefluors are: FITC (490 nm; 520 nm), Cy3 (554 nm; 568 nm), Cy3.5 (581 nm;588 nm), Cy5 (652 nm: 672 nm), Cy5.5 (682 nm; 703 nm) and Cy7 (755 nm;778 nm), thus allowing their simultaneous detection. Other examples offluorescein dyes include 6-carboxyfluorescein (6-FAM),2′,4′,1,4,-tetrachlorofluorescein (TET),2′,4′,5′,7′,1,4-hexachlorofluorescein (HEX),2′,7′-dimethoxy-4′,5′-dichloro-6-carboxyrhodamine (JOE),2′-chloro-5′-fluoro-7′,8′-fused phenyl-1,4-dichloro-6-carboxyfluorescein(NED), and 2′-chloro-7′-phenyl-1,4-dichloro-6-carboxyfluorescein (VIC).Fluorescent labels can be obtained from a variety of commercial sources,including Amersham Pharmacia Biotech, Piscataway, N.J.; MolecularProbes, Eugene, Oreg.; and Research Organics, Cleveland, Ohio.

Additional labels of interest include those that provide for signal onlywhen the probe with which they are associated is specifically bound to atarget molecule, where such labels include: “molecular beacons” asdescribed in Tyagi & Kramer, Nature Biotechnology (1996) 14:303 and EP 0070 685 B1. Other labels of interest include those described in U.S.Pat. No. 5,563,037; WO 97/17471 and WO 97/17076.

Labeled nucleotides are a useful form of detection label for directincorporation into expressed nucleic acids during synthesis. Examples ofdetection labels that can be incorporated into nucleic acids includenucleotide analogs such as BrdUrd (5-bromodeoxyuridine, Hoy and Schimke,Mutation Research 290:217-230 (1993)), aminoallyldeoxyuridine (Henegariuet al., Nature Biotechnology 18:345-348 (2000)), 5-methylcytosine (Sanoet al., Biochim. Biophys. Acta 951:157-165 (1988)), bromouridine(Wansick et al., J. Cell Biology 122:283-293 (1993)) and nucleotidesmodified with biotin (Langer et al., Proc. Natl. Acad. Sci. USA 78:6633(1981)) or with suitable haptens such as digoxygenin (Kerkhof, Anal.Biochem. 205:359-364 (1992)). Suitable fluorescence-labeled nucleotidesare Fluorescein-isothiocyanate-dUTP, Cyanine-3-dUTP and Cyanine-5-dUTP(Yu et al., Nucleic Acids Res., 22:3226-3232 (1994)). A preferrednucleotide analog detection label for DNA is BrdUrd (bromodeoxyuridine,BrdUrd, BrdU, BUdR, Sigma-Aldrich Co). Other useful nucleotide analogsfor incorporation of detection label into DNA are AA-dUTP(aminoallyl-deoxyuridine triphosphate, Sigma-Aldrich Co.), and5-methyl-dCTP (Roche Molecular Biochemicals). A useful nucleotide analogfor incorporation of detection label into RNA is biotin-16-UTP(biotin-16-uridine-5′-triphosphate, Roche Molecular Biochemicals).Fluorescein, Cy3, and Cy5 can be linked to dUTP for direct labelling.Cy3.5 and Cy7 are available as avidin or anti-digoxygenin conjugates forsecondary detection of biotin- or digoxygenin-labelled probes.

Detection labels that are incorporated into nucleic acid, such asbiotin, can be subsequently detected using sensitive methods well-knownin the art. For example, biotin can be detected usingstreptavidin-alkaline phosphatase conjugate (Tropix, Inc.), which isbound to the biotin and subsequently detected by chemiluminescence ofsuitable substrates (for example, chemiluminescent substrate CSPD:disodium, 3-(4-methoxyspiro-[1,2,-dioxetane-3-2′-(5′-chloro)tricyclo[3.3.1.1^(3,7)]decane]-4-yl) phenyl phosphate; Tropix, Inc.). Labels canalso be enzymes, such as alkaline phosphatase, soybean peroxidase,horseradish peroxidase and polymerases, that can be detected, forexample, with chemical signal amplification or by using a substrate tothe enzyme which produces light (for example, a chemiluminescent1,2-dioxetane substrate) or fluorescent signal.

Molecules that combine two or more of these detection labels are alsoconsidered detection labels. Any of the known detection labels can beused with the disclosed probes, tags, molecules and methods to label anddetect activated or deactivated riboswitches or nucleic acid or proteinproduced in the disclosed methods. Methods for detecting and measuringsignals generated by detection labels are also known to those of skillin the art. For example, radioactive isotopes can be detected byscintillation counting or direct visualization; fluorescent moleculescan be detected with fluorescent spectrophotometers; phosphorescentmolecules can be detected with a spectrophotometer or directlyvisualized with a camera; enzymes can be detected by detection orvisualization of the product of a reaction catalyzed by the enzyme;antibodies can be detected by detecting a secondary detection labelcoupled to the antibody. As used herein, detection molecules aremolecules which interact with a compound or composition to be detectedand to which one or more detection labels are coupled.

I. Sequence Similarities

It is understood that as discussed herein the use of the terms homologyand identity mean the same thing as similarity. Thus, for example, ifthe use of the word homology is used between two sequences (non-naturalsequences, for example) it is understood that this is not necessarilyindicating an evolutionary relationship between these two sequences, butrather is looking at the similarity or relatedness between their nucleicacid sequences. Many of the methods for determining homology between twoevolutionarily related molecules are routinely applied to any two ormore nucleic acids or proteins for the purpose of measuring sequencesimilarity regardless of whether they are evolutionarily related or not.

In general, it is understood that one way to define any known variantsand derivatives or those that might arise, of the disclosedriboswitches, aptamers, expression platforms, genes and proteins herein,is through defining the variants and derivatives in terms of homology tospecific known sequences. This identity of particular sequencesdisclosed herein is also discussed elsewhere herein. In general,variants of riboswitches, aptamers, expression platforms, genes andproteins herein disclosed typically have at least, about 70, 71, 72, 73,74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,92, 93, 94, 95, 96, 97, 98, or 99 percent homology to a stated sequenceor a native sequence. Those of skill in the art readily understand howto determine the homology of two proteins or nucleic acids, such asgenes. For example, the homology can be calculated after aligning thetwo sequences so that the homology is at its highest level.

Another way of calculating homology can be performed by publishedalgorithms. Optimal alignment of sequences for comparison can beconducted by the local homology algorithm of Smith and Waterman Adv.Appl. Math. 2: 482 (1981), by the homology alignment algorithm ofNeedleman and Wunsch, J. Mol. Biol. 48: 443 (1970), by the search forsimilarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. U.S.A.85: 2444 (1988), by computerized implementations of these algorithms(GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics SoftwarePackage, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or byinspection.

The same types of homology can be obtained for nucleic acids by forexample the algorithms disclosed in Zuker, M. Science 244:48-52, 1989,Jaeger et al. Proc. Natl. Acad. Sci. USA 86:7706-7710, 1989, Jaeger etal. Methods Enzymol. 183:281-306, 1989 which are herein incorporated byreference for at least material related to nucleic acid alignment. It isunderstood that any of the methods typically can be used and that incertain instances the results of these various methods can differ, butthe skilled artisan understands if identity is found with at least oneof these methods, the sequences would be said to have the statedidentity.

For example, as used herein, a sequence recited as having a particularpercent homology to another sequence refers to sequences that have therecited homology as calculated by any one or more of the calculationmethods described above. For example, a first sequence has 80 percenthomology, as defined herein, to a second sequence if the first sequenceis calculated to have 80 percent homology to the second sequence usingthe Zuker calculation method even if the first sequence does not have 80percent homology to the second sequence as calculated by any of theother calculation methods. As another example, a first sequence has 80percent homology, as defined herein, to a second sequence if the firstsequence is calculated to have 80 percent homology to the secondsequence using both the Zuker calculation method and the Pearson andLipman calculation method even if the first sequence does not have 80percent homology to the second sequence as calculated by the Smith andWaterman calculation method, the Needleman and Wunsch calculationmethod, the Jaeger calculation methods, or any of the other calculationmethods. As yet another example, a first sequence has 80 percenthomology, as defined herein, to a second sequence if the first sequenceis calculated to have 80 percent homology to the second sequence usingeach of calculation methods (although, in practice, the differentcalculation methods will often result in different calculated homologypercentages).

J. Hybridization and Selective Hybridization

The term hybridization typically means a sequence driven interactionbetween at least two nucleic acid molecules, such as a primer or a probeand a riboswitch or a gene. Sequence driven interaction means aninteraction that occurs between two nucleotides or nucleotide analogs ornucleotide derivatives in a nucleotide specific manner. For example, Ginteracting with C or A interacting with T are sequence driveninteractions. Typically sequence driven interactions occur on theWatson-Crick face or Hoogsteen face of the nucleotide. The hybridizationof two nucleic acids is affected by a number of conditions andparameters known to those of skill in the art. For example, the saltconcentrations, pH, and temperature of the reaction all affect whethertwo nucleic acid molecules will hybridize.

Parameters for selective hybridization between two nucleic acidmolecules are well known to those of skill in the art. For example, insome embodiments selective hybridization conditions can be defined asstringent hybridization conditions. For example, stringency ofhybridization is controlled by both temperature and salt concentrationof either or both of the hybridization and washing steps. For example,the conditions of hybridization to achieve selective hybridization caninvolve hybridization in high ionic strength solution (6×SSC or 6×SSPE)at a temperature that is about 12-25° C. below the Tm (the meltingtemperature at which half of the molecules dissociate from theirhybridization partners) followed by washing at a combination oftemperature and salt concentration chosen so that the washingtemperature is about 5° C. to 20° C. below the Tm. The temperature andsalt conditions are readily determined empirically in preliminaryexperiments in which samples of reference DNA immobilized on filters arehybridized to a labeled nucleic acid of interest and then washed underconditions of different stringencies. Hybridization temperatures aretypically higher for DNA-RNA and RNA-RNA hybridizations. The conditionscan be used as described above to achieve stringency, or as is known inthe art (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2ndEd., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1989;Kunkel et al. Methods Enzymol. 1987:154:367, 1987 which is hereinincorporated by reference for material at least related to hybridizationof nucleic acids). A preferable stringent hybridization condition for aDNA:DNA hybridization can be at about 68° C. (in aqueous solution) in6×SSC or 6×SSPE followed by washing at 68° C. Stringency ofhybridization and washing, if desired, can be reduced accordingly as thedegree of complementarity desired is decreased, and further, dependingupon the G-C or A-T richness of any area wherein variability is searchedfor. Likewise, stringency of hybridization and washing, if desired, canbe increased accordingly as homology desired is increased, and further,depending upon the G-C or A-T richness of any area wherein high homologyis desired, all as known in the art.

Another way to define selective hybridization is by looking at theamount (percentage) of one of the nucleic acids bound to the othernucleic acid. For example, in some embodiments selective hybridizationconditions would be when at least about, 60, 65, 70, 71, 72, 73, 74, 75,76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,94, 95, 96, 97, 98, 99, 100 percent of the limiting nucleic acid isbound to the non-limiting nucleic acid. Typically, the non-limitingnucleic acid is in for example, 10 or 100 or 1000 fold excess. This typeof assay can be performed at under conditions where both the limitingand non-limiting nucleic acids are for example, 10 fold or 100 fold or1000 fold below their k_(d), or where only one of the nucleic acidmolecules is 10 fold or 100 fold or 1000 fold or where one or bothnucleic acid molecules are above their k_(d).

Another way to define selective hybridization is by looking at thepercentage of nucleic acid that gets enzymatically manipulated underconditions where hybridization is required to promote the desiredenzymatic manipulation. For example, in some embodiments selectivehybridization conditions would be when at least about, 60, 65, 70, 71,72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 percent of the nucleic acidis enzymatically manipulated under conditions which promote theenzymatic manipulation, for example if the enzymatic manipulation is DNAextension, then selective hybridization conditions would be when atleast about 60, 65, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82,83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100percent of the nucleic acid molecules are extended. Preferred conditionsalso include those suggested by the manufacturer or indicated in the artas being appropriate for the enzyme performing the manipulation.

Just as with homology, it is understood that there are a variety ofmethods herein disclosed for determining the level of hybridizationbetween two nucleic acid molecules. It is understood that these methodsand conditions can provide different percentages of hybridizationbetween two nucleic acid molecules, but unless otherwise indicatedmeeting the parameters of any of the methods would be sufficient. Forexample if 80% hybridization was required and as long as hybridizationoccurs within the required parameters in any one of these methods it isconsidered disclosed herein.

It is understood that those of skill in the art understand that if acomposition or method meets any one of these criteria for determininghybridization either collectively or singly it is a composition ormethod that is disclosed herein.

K. Nucleic Acids

There are a variety of molecules disclosed herein that are nucleic acidbased, including, for example, riboswitches, aptamers, and nucleic acidsthat encode riboswitches and aptamers. The disclosed nucleic acids canbe made up of for example, nucleotides, nucleotide analogs, ornucleotide substitutes. Non-limiting examples of these and othermolecules are discussed herein. It is understood that for example, whena vector is expressed in a cell, that the expressed mRNA will typicallybe made up of A, C, G, and U. Likewise, it is understood that if anucleic acid molecule is introduced into a cell or cell environmentthrough for example exogenous delivery, it is advantageous that thenucleic acid molecule be made up of nucleotide analogs that reduce thedegradation of the nucleic acid molecule in the cellular environment.

So long as their relevant function is maintained, riboswitches,aptamers, expression platforms and any other oligonucleotides andnucleic acids can be made up of or include modified nucleotides(nucleotide analogs). Many modified nucleotides are known and can beused in oligonucleotides and nucleic acids. A nucleotide analog is anucleotide which contains some type of modification to either the base,sugar, or phosphate moieties. Modifications to the base moiety wouldinclude natural and synthetic modifications of A, C, G, and T/U as wellas different purine or pyrimidine bases, such as uracil-5-yl,hypoxanthin-9-yl (I), and 2-aminoadenin-9-yl. A modified base includesbut is not limited to 5-methylcytosine (5-me-C), 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and otheralkyl derivatives of adenine and guanine, 2-propyl and other alkylderivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil andcytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil),4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl andother 8-substituted adenines and guanines, 5-halo particularly 5-bromo,5-trifluoromethyl and other 5-substituted uracils and cytosines,7-methylguanine and 7-methyladenine, 8-azaguanine and 8-azaadenine,7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine.Additional base modifications can be found for example in U.S. Pat. No.3,687,808, Englisch et al., Angewandte Chemie, International Edition,1991, 30, 613, and Sanghvi, Y. S., Chapter 15, Antisense Research andApplications, pages 289-302, Crooke, S. T. and Lebleu, B. ed., CRCPress, 1993. Certain nucleotide analogs, such as 5-substitutedpyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines,including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine.5-methylcytosine can increase the stability of duplex formation. Othermodified bases are those that function as universal bases. Universalbases include 3-nitropyrrole and 5-nitroindole. Universal basessubstitute for the normal bases but have no bias in base pairing. Thatis, universal bases can base pair with any other base. Basemodifications often can be combined with for example a sugarmodification, such as 2′-O-methoxyethyl, to achieve unique propertiessuch as increased duplex stability. There are numerous United Statespatents such as U.S. Pat. Nos. 4,845,205; 5,130,302; 5,134,066;5,175,273; 5,367,066; 5,432,272; 5,457,187; 5,459,255; 5,484,908;5,502,177; 5,525,711; 5,552,540; 5,587,469; 5,594,121, 5,596,091;5,614,617; and 5,681,941, which detail and describe a range of basemodifications. Each of these patents is herein incorporated by referencein its entirety, and specifically for their description of basemodifications, their synthesis, their use, and their incorporation intooligonucleotides and nucleic acids.

Nucleotide analogs can also include modifications of the sugar moiety.Modifications to the sugar moiety would include natural modifications ofthe ribose and deoxyribose as well as synthetic modifications. Sugarmodifications include but are not limited to the following modificationsat the 2′ position: OH; F; O-, S-, or N-alkyl; O-, S-, or N-alkenyl; O-,S- or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl andalkynyl can be substituted or unsubstituted C1 to C10, alkyl or C2 toC10 alkenyl and alkynyl. 2′ sugar modifications also include but are notlimited to —O[(CH₂)_(n) O]_(m) CH₃, —O(CH₂)_(n) OCH₃, —O(CH₂)_(n) NH₂,—O(CH₂)_(n) CH₃, —O(CH₂)_(n)—ONH₂, and —O(CH₂)nON[(CH₂)_(n) CH₃)]₂,where n and m are from 1 to about 10.

Other modifications at the 2′ position include but are not limited to:C1 to C10 lower alkyl, substituted lower alkyl, alkaryl, aralkyl,O-alkaryl or O-aralkyl, SH, SCH₃, OCN, Cl, Br, CN, CF₃, OCF₃, SOCH₃, SO₂CH₃, ONO₂, NO₂, N₃, NH₂, heterocycloalkyl, heterocycloalkaryl,aminoalkylamino, polyalkylamino, substituted silyl, an RNA cleavinggroup, a reporter group, an intercalator, a group for improving thepharmacokinetic properties of an oligonucleotide, or a group forimproving the pharmacodynamic properties of an oligonucleotide, andother substituents having similar properties. Similar modifications canalso be made at other positions on the sugar, particularly the 3′position of the sugar on the 3′ terminal nucleotide or in 2′-5′ linkedoligonucleotides and the 5′ position of 5′ terminal nucleotide. Modifiedsugars would also include those that contain modifications at thebridging ring oxygen, such as CH₂ and S. Nucleotide sugar analogs canalso have sugar mimetics such as cyclobutyl moieties in place of thepentofuranosyl sugar. There are numerous United States patents thatteach the preparation of such modified sugar structures such as U.S.Pat. Nos. 4,981,957; 5,118,800; 5,319,080; 5,359,044; 5,393,878;5,446,137; 5,466,786; 5,514,785; 5,519,134; 5,567,811; 5,576,427;5,591,722; 5,597,909; 5,610,300; 5,627,053; 5,639,873; 5,646,265;5,658,873; 5,670,633; and 5,700,920, each of which is hereinincorporated by reference in its entirety, and specifically for theirdescription of modified sugar structures, their synthesis, their use,and their incorporation into nucleotides, oligonucleotides and nucleicacids.

Nucleotide analogs can also be modified at the phosphate moiety.Modified phosphate moieties include but are not limited to those thatcan be modified so that the linkage between two nucleotides contains aphosphorothioate, chiral phosphorothioate, phosphorodithioate,phosphotriester, aminoalkylphosphotriester, methyl and other alkylphosphonates including 3′-alkylene phosphonate and chiral phosphonates,phosphinates, phosphoramidates including 3′-amino phosphoramidate andaminoalkylphosphoramidates, thionophosphoramidates,thionoalkylphosphonates, thionoalkylphosphotriesters, andboranophosphates. It is understood that these phosphate or modifiedphosphate linkages between two nucleotides can be through a 3′-5′linkage or a 2′-5′ linkage, and the linkage can contain invertedpolarity such as 3′-5′ to 5′-3′ or 2′-5′ to 5′-2′. Various salts, mixedsalts and free acid forms are also included. Numerous United Statespatents teach how to make and use nucleotides containing modifiedphosphates and include but are not limited to, U.S. Pat. Nos. 3,687,808;4,469,863; 4,476,301; 5,023,243; 5,177,196; 5,188,897; 5,264,423;5,276,019; 5,278,302; 5,286,717; 5,321,131; 5,399,676; 5,405,939;5,453,496; 5,455,233; 5,466,677; 5,476,925; 5,519,126; 5,536,821;5,541,306; 5,550,111; 5,563,253; 5,571,799; 5,587,361; and 5,625,050,each of which is herein incorporated by reference its entirety, andspecifically for their description of modified phosphates, theirsynthesis, their use, and their incorporation into nucleotides,oligonucleotides and nucleic acids.

It is understood that nucleotide analogs need only contain a singlemodification, but can also contain multiple modifications within one ofthe moieties or between different moieties.

Nucleotide substitutes are molecules having similar functionalproperties to nucleotides, but which do not contain a phosphate moiety,such as peptide nucleic acid (PNA). Nucleotide substitutes are moleculesthat will recognize and hybridize to (base pair to) complementarynucleic acids in a Watson-Crick or Hoogsteen manner, but which arelinked together through a moiety other than a phosphate moiety.Nucleotide substitutes are able to conform to a double helix typestructure when interacting with the appropriate target nucleic acid.

Nucleotide substitutes are nucleotides or nucleotide analogs that havehad the phosphate moiety and/or sugar moieties replaced. Nucleotidesubstitutes do not contain a standard phosphorus atom. Substitutes forthe phosphate can be for example, short chain alkyl or cycloalkylinternucleoside linkages, mixed heteroatom and alkyl or cycloalkylinternucleoside linkages, or one or more short chain heteroatomic orheterocyclic internucleoside linkages. These include those havingmorpholino linkages (formed in part from the sugar portion of anucleoside); siloxane backbones; sulfide, sulfoxide and sulfonebackbones; formacetyl and thioformacetyl backbones; methylene formacetyland thioformacetyl backbones; alkene containing backbones; sulfamatebackbones; methyleneimino and methylenehydrazino backbones; sulfonateand sulfonamide backbones; amide backbones; and others having mixed N,O, S and CH2 component parts. Numerous United States patents disclosehow to make and use these types of phosphate replacements and includebut are not limited to U.S. Pat. Nos. 5,034,506; 5,166,315; 5,185,444;5,214,134; 5,216,141; 5,235,033; 5,264,562; 5,264,564; 5,405,938;5,434,257; 5,466,677; 5,470,967; 5,489,677; 5,541,307; 5,561,225;5,596,086; 5,602,240; 5,610,289; 5,602,240; 5,608,046; 5,610,289;5,618,704; 5,623,070; 5,663,312; 5,633,360; 5,677,437; and 5,677,439,each of which is herein incorporated by reference its entirety, andspecifically for their description of phosphate replacements, theirsynthesis, their use, and their incorporation into nucleotides,oligonucleotides and nucleic acids.

It is also understood in a nucleotide substitute that both the sugar andthe phosphate moieties of the nucleotide can be replaced, by for examplean amide type linkage (aminoethylglycine) (PNA). U.S. Pat. Nos.5,539,082; 5,714,331; and 5,719,262 teach how to make and use PNAmolecules, each of which is herein incorporated by reference. (See alsoNielsen et al., Science 254:1497-1500 (1991)).

Oligonucleotides and nucleic acids can be comprised of nucleotides andcan be made up of different types of nucleotides or the same type ofnucleotides. For example, one or more of the nucleotides in anoligonucleotide can be ribonucleotides, 2′-O-methyl ribonucleotides, ora mixture of ribonucleotides and 2′-O-methyl ribonucleotides; about 10%to about 50% of the nucleotides can be ribonucleotides, 2′-O-methylribonucleotides, or a mixture of ribonucleotides and 2′-O-methylribonucleotides; about 50% or more of the nucleotides can beribonucleotides, 2′-O-methyl ribonucleotides, or a mixture ofribonucleotides and 2′-O-methyl ribonucleotides; or all of thenucleotides are ribonucleotides, 2′-O-methyl ribonucleotides, or amixture of ribonucleotides and 2′-O-methyl ribonucleotides. Sucholigonucleotides and nucleic acids can be referred to as chimericoligonucleotides and chimeric nucleic acids.

L. Solid Supports

Solid supports are solid-state substrates or supports with whichmolecules (such as trigger molecules) and riboswitches (or othercomponents used in, or produced by, the disclosed methods) can beassociated. Riboswitches and other molecules can be associated withsolid supports directly or indirectly. For example, analytes (e.g.,trigger molecules, test compounds) can be bound to the surface of asolid support or associated with capture agents (e.g., compounds ormolecules that bind an analyte) immobilized on solid supports. Asanother example, riboswitches can be bound to the surface of a solidsupport or associated with probes immobilized on solid supports. Anarray is a solid support to which multiple riboswitches, probes or othermolecules have been associated in an array, grid, or other organizedpattern.

Solid-state substrates for use in solid supports can include any solidmaterial with which components can be associated, directly orindirectly. This includes materials such as acrylamide, agarose,cellulose, nitrocellulose, glass, gold, polystyrene, polyethylene vinylacetate, polypropylene, polymethacrylate, polyethylene, polyethyleneoxide, polysilicates, polycarbonates, teflon, fluorocarbons, nylon,silicon rubber, polyanhydrides, polyglycolic acid, polylactic acid,polyorthoesters, functionalized silane, polypropylfumerate, collagen,glycosaminoglycans, and polyamino acids. Solid-state substrates can haveany useful form including thin film, membrane, bottles, dishes, fibers,woven fibers, shaped polymers, particles, beads, microparticles, or acombination. Solid-state substrates and solid supports can be porous ornon-porous. A chip is a rectangular or square small piece of material.Preferred forms for solid-state substrates are thin films, beads, orchips. A useful form for a solid-state substrate is a microtiter dish.In some embodiments, a multiwell glass slide can be employed.

An array can include a plurality of riboswitches, trigger molecules,other molecules, compounds or probes immobilized at identified orpredefined locations on the solid support. Each predefined location onthe solid support generally has one type of component (that is, all thecomponents at that location are the same). Alternatively, multiple typesof components can be immobilized in the same predefined location on asolid support. Each location will have multiple copies of the givencomponents. The spatial separation of different components on the solidsupport allows separate detection and identification.

Although useful, it is not required that the solid support be a singleunit or structure. A set of riboswitches, trigger molecules, othermolecules, compounds and/or probes can be distributed over any number ofsolid supports. For example, at one extreme, each component can beimmobilized in a separate reaction tube or container, or on separatebeads or microparticles.

Methods for immobilization of oligonucleotides to solid-state substratesare well established. Oligonucleotides, including address probes anddetection probes, can be coupled to substrates using establishedcoupling methods. For example, suitable attachment methods are describedby Pease et al., Proc. Natl. Acad. Sci. USA 91(11):5022-5026 (1994), andKhrapko et al., Mol Biol (Mosk) (USSR) 25:718-730 (1991). A method forimmobilization of 3′-amine oligonucleotides on casein-coated slides isdescribed by Stimpson et al., Proc. Natl. Acad. Sci. USA 92:6379-6383(1995). A useful method of attaching oligonucleotides to solid-statesubstrates is described by Guo et al., Nucleic Acids Res. 22:5456-5465(1994).

Each of the components (for example, riboswitches, trigger molecules, orother molecules) immobilized on the solid support can be located in adifferent predefined region of the solid support. The differentlocations can be different reaction chambers. Each of the differentpredefined regions can be physically separated from each other of thedifferent regions. The distance between the different predefined regionsof the solid support can be either fixed or variable. For example, in anarray, each of the components can be arranged at fixed distances fromeach other, while components associated with beads will not be in afixed spatial relationship. In particular, the use of multiple solidsupport units (for example, multiple beads) will result in variabledistances.

Components can be associated or immobilized on a solid support at anydensity. Components can be immobilized to the solid support at a densityexceeding 400 different components per cubic centimeter. Arrays ofcomponents can have any number of components. For example, an array canhave at least 1,000 different components immobilized on the solidsupport, at least 10,000 different components immobilized on the solidsupport, at least 100,000 different components immobilized on the solidsupport, or at least 1,000,000 different components immobilized on thesolid support.

M. Kits

The materials described above as well as other materials can be packagedtogether in any suitable combination as a kit useful for performing, oraiding in the performance of, the disclosed method. It is useful if thekit components in a given kit are designed and adapted for use togetherin the disclosed method. For example disclosed are kits for detectingcompounds, the kit comprising one or more biosensor riboswitches. Thekits also can contain reagents and labels for detecting activation ofthe riboswitches.

N. Mixtures

Disclosed are mixtures formed by performing or preparing to perform thedisclosed method. For example, disclosed are mixtures comprisingriboswitches and trigger molecules.

Whenever the method involves mixing or bringing into contactcompositions or components or reagents, performing the method creates anumber of different mixtures. For example, if the method includes 3mixing steps, after each one of these steps a unique mixture is formedif the steps are performed separately. In addition, a mixture is formedat the completion of all of the steps regardless of how the steps wereperformed. The present disclosure contemplates these mixtures, obtainedby the performance of the disclosed methods as well as mixturescontaining any disclosed reagent, composition, or component, forexample, disclosed herein.

O. Systems

Disclosed are systems useful for performing, or aiding in theperformance of, the disclosed method. Systems generally comprisecombinations of articles of manufacture such as structures, machines,devices, and the like, and compositions, compounds, materials, and thelike. Such combinations that are disclosed or that are apparent from thedisclosure are contemplated. For example, disclosed and contemplated aresystems comprising iosensor riboswitches, a solid support and asignal-reading device.

P. Data Structures and Computer Control

Disclosed are data structures used in, generated by, or generated from,the disclosed method. Data structures generally are any form of data,information, and/or objects collected, organized, stored, and/orembodied in a composition or medium. Riboswitch structures andactivation measurements stored in electronic form, such as in RAM or ona storage disk, is a type of data structure.

The disclosed method, or any part thereof or preparation therefor, canbe controlled, managed, or otherwise assisted by computer control. Suchcomputer control can be accomplished by a computer controlled process ormethod, can use and/or generate data structures, and can use a computerprogram. Such computer control, computer controlled processes, datastructures, and computer programs are contemplated and should beunderstood to be disclosed herein.

Methods

Disclosed are methods for activating, deactivating or blocking ariboswitch. Such methods can involve, for example, bringing into contacta riboswitch and a compound or trigger molecule that can activate,deactivate or block the riboswitch. Riboswitches function to controlgene expression through the binding or removal of a trigger molecule.Compounds can be used to activate, deactivate or block a riboswitch. Thetrigger molecule for a riboswitch (as well as other activatingcompounds) can be used to activate a riboswitch. Compounds other thanthe trigger molecule generally can be used to deactivate or block ariboswitch. Riboswitches can also be deactivated by, for example,removing trigger molecules from the presence of the riboswitch. Thus,the disclosed method of deactivating a riboswitch can involve, forexample, removing a trigger molecule (or other activating compound) fromthe presence or contact with the riboswitch. A riboswitch can be blockedby, for example, binding of an analog of the trigger molecule that doesnot activate the riboswitch.

Also disclosed are methods for altering expression of an RNA molecule,or of a gene encoding an RNA molecule, where the RNA molecule includes ariboswitch, by bringing a compound into contact with the RNA molecule.Riboswitches function to control gene expression through the binding orremoval of a trigger molecule. Thus, subjecting an RNA molecule ofinterest that includes a riboswitch to conditions that activate,deactivate or block the riboswitch can be used to alter expression ofthe RNA. Expression can be altered as a result of, for example,termination of transcription or blocking of ribosome binding to the RNA.Binding of a trigger molecule can, depending on the nature of theriboswitch, reduce or prevent expression of the RNA molecule or promoteor increase expression of the RNA molecule.

Also disclosed are methods for regulating expression of an RNA molecule,or of a gene encoding an RNA molecule, by operably linking a riboswitchto the RNA molecule. A riboswitch can be operably linked to an RNAmolecule in any suitable manner, including, for example, by physicallyjoining the riboswitch to the RNA molecule or by engineering nucleicacid encoding the RNA molecule to include and encode the riboswitch suchthat the RNA produced from the engineered nucleic acid has theriboswitch operably linked to the RNA molecule. Subjecting a riboswitchoperably linked to an RNA molecule of interest to conditions thatactivate, deactivate or block the riboswitch can be used to alterexpression of the RNA.

Also disclosed are methods for regulating expression of a naturallyoccurring gene or RNA that contains a riboswitch by activating,deactivating or blocking the riboswitch. If the gene is essential forsurvival of a cell or organism that harbors it, activating, deactivatingor blocking the riboswitch can in death, stasis or debilitation of thecell or organism. For example, activating a naturally occurringriboswitch in a naturally occurring gene that is essential to survivalof a microorganism can result in death of the microorganism (ifactivation of the riboswitch turns off or represses expression). This isone basis for the use of the disclosed compounds and methods forantimicrobial and antibiotic effects.

Also disclosed are methods for regulating expression of an isolated,engineered or recombinant gene or RNA that contains a riboswitch byactivating, deactivating or blocking the riboswitch. The gene or RNA canbe engineered or can be recombinant in any manner. For example, theriboswitch and coding region of the RNA can be heterologous, theriboswitch can be recombinant or chimeric, or both. If the gene encodesa desired expression product, activating or deactivating the riboswitchcan be used to induce expression of the gene and thus result inproduction of the expression product. If the gene encodes an inducer orrepressor of gene expression or of another cellular process, activation,deactivation or blocking of the riboswitch can result in induction,repression, or de-repression of other, regulated genes or cellularprocesses. Many such secondary regulatory effects are known and can beadapted for use with riboswitches. An advantage of riboswitches as theprimary control for such regulation is that riboswitch trigger moleculescan be small, non-antigenic molecules.

Also disclosed are methods for altering the regulation of a riboswitchby operably linking an aptamer domain to the expression platform domainof the riboswitch (which is a chimeric riboswitch). The aptamer domaincan then mediate regulation of the riboswitch through the action of, forexample, a trigger molecule for the aptamer domain. Aptamer domains canbe operably linked to expression platform domains of riboswitches in anysuitable manner, including, for example, by replacing the normal ornatural aptamer domain of the riboswitch with the new aptamer domain.Generally, any compound or condition that can activate, deactivate orblock the riboswitch from which the aptamer domain is derived can beused to activate, deactivate or block the chimeric riboswitch.

Also disclosed are methods for inactivating a riboswitch by covalentlyaltering the riboswitch (by, for example, crosslinking parts of theriboswitch or coupling a compound to the riboswitch). Inactivation of ariboswitch in this manner can result from, for example, an alterationthat prevents the trigger molecule for the riboswitch from binding, thatprevents the change in state of the riboswitch upon binding of thetrigger molecule, or that prevents the expression platform domain of theriboswitch from affecting expression upon binding of the triggermolecule.

Also disclosed are methods for selecting, designing or deriving newriboswitches and/or new aptamers that recognize new trigger molecules.Such methods can involve production of a set of aptamer variants in ariboswitch, assessing the activation of the variant riboswitches in thepresence of a compound of interest, selecting variant riboswitches thatwere activated (or, for example, the riboswitches that were the mosthighly or the most selectively activated), and repeating these stepsuntil a variant riboswitch of a desired activity, specificity,combination of activity and specificity, or other combination ofproperties results. Also disclosed are riboswitches and aptamer domainsproduced by these methods.

Techniques for in vitro selection and in vitro evolution of functionalnucleic acid molecules are known and can be adapted for use withriboswitches and their components. Useful techniques are described by,for example, A. Roth and R. R. Breaker (2003) Selection in vitro ofallosteric ribozymes. In: Methods in Molecular Biology Series—CatalyticNucleic Acid Protocols (Sioud, M., ed.), Humana, Totowa, N.J.; R. R.Breaker (2002) Engineered Allosteric Ribozymes as Biosensor Components.Curr. Opin. Biotechnol. 13:31-39; G. M. Emilsson and R. R. Breaker(2002) Deoxyribozymes: New Activities and New Applications. Cell. Mol.Life Sci. 59:596-607; Y. Li, R. R. Breaker (2001) In vitro Selection ofKinase and Ligase Deoxyribozymes. Methods 23:179-190; G. A. Soukup, R.R. Breaker (2000) Allosteric Ribozymes. In: Ribozymes: Biology andBiotechnology. R. K. Gaur and G. Krupp eds. Eaton Publishing; G. A.Soukup, R. R. Breaker (2000) Allosteric Nucleic Acid Catalysts. Curr.Opin. Struct. Biol. 10:318-325; G. A. Soukup, R. R. Breaker (1999)Nucleic Acid Molecular Switches. Trends Biotechnol. 17:469-476; R. R.Breaker (1999) In vitro Selection of Self-cleaving Ribozymes andDeoxyribozymes. In: Intracellular Ribozyme Applications: Principles andProtocols. L. Couture, J. Rossi eds. Horizon Scientific Press, Norfolk,England; R. R. Breaker (1997) In vitro Selection of CatalyticPolynucleotides. Chem. Rev. 97:371-390; and references cited therein;each of these publications being specifically incorporated herein byreference for their description of in vitro selections and evolutiontechniques.

Also disclosed are methods for selecting and identifying compounds thatcan activate, deactivate or block a riboswitch. Activation of ariboswitch refers to the change in state of the riboswitch upon bindingof a trigger molecule. A riboswitch can be activated by compounds otherthan the trigger molecule and in ways other than binding of a triggermolecule. The term trigger molecule is used herein to refer to moleculesand compounds that can activate a riboswitch. This includes the naturalor normal trigger molecule for the riboswitch and other compounds thatcan activate the riboswitch. Natural or normal trigger molecules are thetrigger molecule for a given riboswitch in nature or, in the case ofsome non-natural riboswitches, the trigger molecule for which theriboswitch was designed or with which the riboswitch was selected (asin, for example, in vitro selection or in vitro evolution techniques).Non-natural trigger molecules can be referred to as non-natural triggermolecules.

Deactivation of a riboswitch refers to the change in state of theriboswitch when the trigger molecule is not bound. A riboswitch can bedeactivated by binding of compounds other than the trigger molecule andin ways other than removal of the trigger molecule. Blocking of ariboswitch refers to a condition or state of the riboswitch where thepresence of the trigger molecule does not activate the riboswitch.

Also disclosed are methods of identifying compounds that activate,deactivate or block a riboswitch. For examples, compounds that activatea riboswitch can be identified by bringing into contact a test compoundand a riboswitch and assessing activation of the riboswitch. If theriboswitch is activated, the test compound is identified as a compoundthat activates the riboswitch. Activation of a riboswitch can beassessed in any suitable manner. For example, the riboswitch can belinked to a reporter RNA and expression, expression level, or change inexpression level of the reporter RNA can be measured in the presence andabsence of the test compound. As another example, the riboswitch caninclude a conformation dependent label, the signal from which changesdepending on the activation state of the riboswitch. Such a riboswitchpreferably uses an aptamer domain from or derived from a naturallyoccurring riboswitch. As can be seen, assessment of activation of ariboswitch can be performed with the use of a control assay ormeasurement or without the use of a control assay or measurement.Methods for identifying compounds that deactivate a riboswitch can beperformed in analogous ways.

Identification of compounds that block a riboswitch can be accomplishedin any suitable manner. For example, an assay can be performed forassessing activation or deactivation of a riboswitch in the presence ofa compound known to activate or deactivate the riboswitch and in thepresence of a test compound. If activation or deactivation is notobserved as would be observed in the absence of the test compound, thenthe test compound is identified as a compound that blocks activation ordeactivation of the riboswitch.

Also disclosed are methods of detecting compounds using biosensorriboswitches. The method can include bringing into contact a test sampleand a biosensor riboswitch and assessing the activation of the biosensorriboswitch. Activation of the biosensor riboswitch indicates thepresence of the trigger molecule for the biosensor riboswitch in thetest sample. Biosensor riboswitches are engineered riboswitches thatproduce a detectable signal in the presence of their cognate triggermolecule. Useful biosensor riboswitches can be triggered at or abovethreshold levels of the trigger molecules. Biosensor riboswitches can bedesigned for use in vivo or in vitro. For example, biosensorriboswitches operably linked to a reporter RNA that encodes a proteinthat serves as or is involved in producing a signal can be used in vivoby engineering a cell or organism to harbor a nucleic acid constructencoding the riboswitch/reporter RNA. An example of a biosensorriboswitch for use in vitro is a riboswitch that includes a conformationdependent label, the signal from which changes depending on theactivation state of the riboswitch. Such a biosensor riboswitchpreferably uses an aptamer domain from or derived from a naturallyoccurring riboswitch.

Biosensor ribsowitches can be used to monitor changing conditionsbecause riboswitch activation is reversible when the concentration ofthe trigger molecule falls and so the signal can vary as concentrationof the trigger molecule varies. The range of concentration of triggermolecules that can be detected can be varied by engineering riboswitcheshaving different dissociation constants for the trigger molecule. Thiscan easily be accomplished by, for example, “degrading” the sensitivityof a riboswitch having high affinity for the trigger molecule. A rangeof concentrations can be monitored by using multiple biosensorriboswitches of different sensitivities in the same sensor or assay.

Also disclosed are compounds made by identifying a compound thatactivates, deactivates or blocks a riboswitch and manufacturing theidentified compound. This can be accomplished by, for example, combiningcompound identification methods as disclosed elsewhere herein withmethods for manufacturing the identified compounds. For example,compounds can be made by bringing into contact a test compound and ariboswitch, assessing activation of the riboswitch, and, if theriboswitch is activated by the test compound, manufacturing the testcompound that activates the riboswitch as the compound.

Also disclosed are compounds made by checking activation, deactivationor blocking of a riboswitch by a compound and manufacturing the checkedcompound. This can be accomplished by, for example, combining compoundactivation, deactivation or blocking assessment methods as disclosedelsewhere herein with methods for manufacturing the checked compounds.For example, compounds can be made by bringing into contact a testcompound and a riboswitch, assessing activation of the riboswitch, and,if the riboswitch is activated by the test compound, manufacturing thetest compound that activates the riboswitch as the compound. Checkingcompounds for their ability to activate, deactivate or block ariboswitch refers to both identification of compounds previously unknownto activate, deactivate or block a riboswitch and to assessing theability of a compound to activate, deactivate or block a riboswitchwhere the compound was already known to activate, deactivate or blockthe riboswitch.

Disclosed is a method of detecting a compound of interest, the methodcomprising bringing into contact a sample and a riboswitch, wherein theriboswitch is activated by the compound of interest, wherein theriboswitch produces a signal when activated by the compound of interest,wherein the riboswitch produces a signal when the sample contains thecompound of interest. The riboswitch can change conformation whenactivated by the compound of interest, wherein the change inconformation produces a signal via a conformation dependent label. Theriboswitch can change conformation when activated by the compound ofinterest, wherein the change in conformation causes a change inexpression of an RNA linked to the riboswitch, wherein the change inexpression produces a signal. The signal can be produced by a reporterprotein expressed from the RNA linked to the riboswitch.

Disclosed is a method comprising (a) testing a compound for inhibitionof gene expression of a gene encoding an RNA comprising a riboswitch,wherein the inhibition is via the riboswitch, and (b) inhibiting geneexpression by bringing into contact a cell and a compound that inhibitedgene expression in step (a), wherein the cell comprises a gene encodingan RNA comprising a riboswitch, wherein the compound inhibits expressionof the gene by binding to the riboswitch.

Also disclosed is a method of identifying riboswitches, the methodcomprising assessing in-line spontaneous cleavage of an RNA molecule inthe presence and absence of a compound, wherein the RNA molecule isencoded by a gene regulated by the compound, wherein a change in thepattern of in-line spontaneous cleavage of the RNA molecule indicates ariboswitch.

A. Identification of Antimicrobial Compounds

Riboswitches are a new class of structured RNAs that have evolved forthe purpose of binding small organic molecules. The natural bindingpocket of riboswitches can be targeted with metabolite analogs or bycompounds that mimic the shape-space of the natural metabolite.Riboswitches are: (1) found in numerous Gram-positive and Gram-negativebacteria including Bacillus anthracis, (2) fundamental regulators ofgene expression in these bacteria, (3) present in multiple copies thatwould be unlikely to evolve simultaneous resistance, and (4) not yetproven to exist in humans. This combination of features makeriboswitches attractive targets for new antimicrobial compounds.Further, the small molecule ligands of riboswitches provide useful sitesfor derivitization to produce drug candidates.

Once a class of riboswitch has been identified and its potential as adrug target assessed (by, for example, determining how many genes in atarget organism are regulated by that class of riboswitch), candidatemolecules can be identified. The following provides an illustration ofthis using the SAM riboswitch (see Example 7).

SAM analogs that substitute the reactive methyl and sulfonium ion centerwith stable sulfur-based linkages (YBD-2 and YBD3) are recognized withadequate affinity (low to mid-nanomolar range) by the riboswitch toserve as a platform for synthesis of additional SAM analogs. Inaddition, a wider range of linkage analogs (N- and C-based linkages) canbe synthesized and tested to provide the optimal platform upon which tomake amino acid and nucleoside derivations.

Sulfoxide and sulfone derivatives of SAM can be used to generateanalogs. Established synthetic protocols described in Ronald T.Borchardt and Yih Shiong Wu, Potential inhibitor ofS-adenosylmethionine-dependent methyltransferase. 1. Modification of theamino acid portion of S-adenosylhomocysteine. J. Med. Chem. 17, 862-868,1974, can be used, for example. These and other analogs can besynthesized and assayed for binding sequentially or in small groups.Additional SAM analogs can be designed during the progression ofcompound identification based on the recognition determinants that areestablished in each round. Simple binding assays can be conducted on B.subtilis and B. anthracis riboswitch RNAs as described elsewhere herein.More advanced assays can also be used.

The most promising SAM analog lead compounds must enter bacterial cellsand bind riboswitches while remaining metabolically inert. In addition,useful SAM analogs must be bound tightly by the riboswitch, but mustalso fail to compete for SAM in the active sites of protein enzymes, orthere is a risk of generating an undesirable toxic effect in thepatient's cells. As a preliminary assessment of these issues, compoundscan be tested for their ability to disrupt B. subtilis growth, but failto affect E. coli cultures (which use SAM but lack SAM riboswitches). Toscreen for lead compound candidates, parallel bacterial cultures can begrown as follows:

1. B. subtilis can be cultured in glucose minimal media in the absenceof exogenously supplied SAM analogs.

2. B. subtilis can be cultured in glucose minimal media in the presenceof exogenously supplied SAM analogs (high doses can be selected, to befollowed by repeated experiments designed to test a concentration rangeof the putative drug compound).

3. E. coli can be cultured in glucose minimal media in the presence ofexogenously supplied SAM analogs (high doses will be selected, to befollowed by repeated experiments designed to test a concentration rangeof the putative drug compound).

Fitness of the various cultures can be compared by measurement ofcellular doubling times. A range of concentrations for the drugcompounds can be tested using cultures grown in microtiter plates andanalyzed using a microplate reader from another laboratory. Culture 1 isexpected to grow well. Drugs that inhibit culture 2 may or may notinhibit growth of culture 3. Drugs that similarly inhibit both culture 2and culture 3 upon exposure to a wide range of drug concentrations canreflect general toxicity induced by the exogenous compound (i.e.,inhibition of many different cellular processes, in addition or in placeof riboswitch inhibition). Successful drug candidates identified in thisscreen will inhibit E. coli only at very high doses, if at all, and willinhibit B. subtilis at much (>10-fold) lower concentrations.

As derivization points on SAM are identified, efficient identificationof lead drug compounds will require larger-scale screening ofappropriate SAM analogs or generic chemical libraries. A high-throughputscreen can be created by one or two different methods using nucleic acidengineering principles. Adaptation of both fluorescent sensor designsoutlined below to formats that are compatible with high-throughputscreening assays can be accommodated by using immobilization methods orsolution-based methods.

One way to create a reporter is to add a third function to theriboswitch by adding a domain that catalyzes the release of afluorescent tag upon SAM binding to the riboswitch domain. In the finalreporter construct, this catalytic domain can be linked to the yitJ SAMriboswitch through a communication module that relays the ligand bindingevent by allowing the correct folding of the catalytic domain forgenerating the fluorescent signal. This can be accomplished as outlinedbelow.

SAM RiboReporter Pool Design: A DNA template for in vitro transcriptionto RNA (FIG. 10) has been constructed by PCR amplification using theappropriate DNA template and primer sequences. In this construct, stemII of the hammerhead (stem P1 of the SAM aptamer) has been randomized topresent more than 250 million possible sequence combinations, whereinsome inevitably will permit function of the ribozyme only when theaptamer is occupied by SAM or a related high-affinity analog. Eachmolecule in the population of constructs is identical in sequence exceptat the random domain where multiple copies of every possible combinationof sequence will be represented in the population.

SAM RiboReporter Selection: The in vitro selection protocol can be arepetitive iteration of the following steps:

1. Transcribe RNA in vitro by standard methods. Include [α-³²P] UTP toincorporate radioactivity throughout the RNA.

2. Purify full length RNA on denaturing PAGE by standard methods.

3. Incubate full length RNA (˜100 pmoles) in negative selection buffercontaining sufficient magnesium for catalytic activity (20 mM) but noSAM. Incubate 4 h at room temperature (˜23° C.), with thermocycling oralkaline denaturation as needed to preclude the emergence of selfishmolecules.

4. Purify full length RNA on denaturing PAGE and discard RNAs that reactin the absence of SAM.

5. Incubate in positive selection buffer containing 20 mM Mg²⁺ and SAM(pH 7.5 at 23° C.). Incubate 20 min at room temperature.

6. Purify cleaved RNA on denaturing PAGE to recover switches that boundSAM and allowed self-cleavage of the RNA.

7. Reverse transcribe RNA to DNA.

8. PCR amplify DNA with primers that reintroduced cleaved portion ofRNA.

The concentration of SAM in step 4 can be 100 μM initially and can bereduced as the selection proceeds. The progress of recovering successfulcommunication modules can be assessed by the amount of cleavage observedon the purification gel in step 6. The selection endpoint can be eitherwhen the population approaches 100% cleavage in 10 nM SAM (conditionsfor maximal activity of the parental ribozyme and riboswitch) or whenthe population approaches a plateau in activity that does not improveover multiple rounds. The end population can then be sequenced.Individual communication module clones can be assayed for generation ofa fluorescent signal in the screening construct in the presence of SAM.

A fluorescent signal can also be generated by riboswitch-mediatedtriggering of a molecular beacon. In this design, riboswitchconformational changes cause a folded molecular beacon tagged with botha fluor and a quencher to unfold and force the fluor away from thequencher by forming a helix with the riboswitch. This mechanism is easyto adapt to existing riboswitches, as this method can take advantage ofthe ligand-mediated formation of terminator and anti-terminator stemsthat are involved in transcription control.

To use riboswitches to report ligand binding by binding a molecularbeacon, the appropriate construct must be determined empirically. Theoptimum length and nucleotide composition of the molecular beacon andits binding site on the riboswitch can be tested systematically toresult in the highest signal-to-noise ratio. The validity of the assaycan be determined by comparing apparent relative binding affinities ofdifferent SAM analogs to a molecular beacon-coupled riboswitch(determined by rate of fluorescent signal generation) to the bindingconstants determined by standard in-line probing.

EXAMPLES A. Example 1 Coenzyme B₁₂ (AdoCbl) Riboswitches

The example described testing and analysis of a riboswitch that controlsgene expression by binding coenzyme B₁₂.

1. Methods

i. Chemicals and Oligonucleotides

Coenzyme B₁₂ (5′-deoxy-5′-adenosylcobalamin or “AdoCbl”) and its analogsmethylcobalamin, cobinamide dicyanide, and cyannocobalamin werepurchased from Sigma. Tritiated AdoCbl was prepared as describedpreviously (Brown and Zou, Thermolysis of coenzymes B₁₂ at physiologicaltemperatures: activation parameters for cobalt-carbon bond homolysis anda quantitative analysis of the perturbation of the homolysis equilibriumby the ribonucleoside triphosphate reductase from Lactobacillusleichmannii. J. Inorg. Biochem. 77, 185-195 (1999)). For informationregarding the AdoCbl analogs B⁶,N⁶-dimethyl-AdoCbl, N⁶-methyl-AdoCbl,N¹-methyl-AdoCbl, 3-deaza-AdoCbl, PurCbl, 2′-deoxy-AdoCbl and13-epi-AdoCbl, see Toraya, In: Chemistry and Biochemistry of B₁₂.Banerjee, R. Ed. (Wiley, New York) pp. 783-809 (1999).

DNA oligonucleotides were synthesized by the Keck FoundationBiotechnology Resource Center at Yale University. DNAs were purified bydenaturing (8 M urea) PAGE and isolated from the gel by crush/soaking in10 mM Tris-HCl (pH 7.5 at 23° C.), 200 mM NaCl and 1 mM EDTA. The DNAwas recovered from the solution by precipitation with ethanol,resuspended in water and stored at −20° C. until use.

ii. RNA Structure Analysis by In-Line Probing

Precursor mRNA leader molecules were prepared by in vitro transcriptionfrom templates generated by PCR (see In vivo Expression Constructs andAssays section below) and 5′ ³²P-labeled using methods describedpreviously (Soukup and Breaker, Allosteric nucleic acid catalysts. Curr.Opin. Struct. Biol. 10, 3t8-325 (2000)). Approximately 20 nM of labeledRNA precursor was incubated as described in the brief description ofFIG. 1. Accompanying digestions were carried out using reactionconditions similar to those described previously (Soukup and Breaker,Relationship between internucleotide linkage geometry and the stabilityof P, -NA. RNA 5, 1308-1325 (1999)). To prevent light-induceddegradation of ligands, incubations were protected from exposure tolight by wrapping each tube with aluminum foil.

iii. Equilibrium Dialysis Assays

Each equilibrium dialysis experiment was conducted using aDispo-Equilibium Dialyzer (ED-1, Harvard Bioscience) apparatus, whereintwo chambers (a and b) each contained 25 μL of equilibration buffer (50mM Tris-HCl [pH 8.3 at 25° C.], 20 mM MgCl₂). The chambers wereseparated by a dialysis membrane with a 5,000 Dalton molecular weightcut-off In each experiment (I-IV, boxed), 100 pmoles of ³H-AdoCbl wereincluded in chamber a, and other additives were included as designated(+) for each chamber. In each step, equilibrations were allows toproceed for 10 hrs at 25° C. before samples were quantitated or beforesubsequent manipulations were carried out. Quantitation was achieved byliquid scintillation counting using 5 or 10 μL of solution from eachchamber.

Dialysis samples were protected from exposure to light by wrapping eachapparatus with aluminum foil.

iv. In Vivo Expression Constructs and Assays

E. coli K-12 strain was used for all btuB-lacZ expression assays andTop10 cells (Invitrogen) were used for plasmid preparation. A DNA(nucleotides −70 to 450) encompassing the btuB leader sequence wasamplified as an EcoRI-BamHI fragment by colony PCR from E. coli strainMC4100 (a gift from S. Gottesman, NIH). The wild-type construct andmutant constructs were inserted into plasmid pRS414 (a gift from R.Simons, UCLA; Simons et al., Improved single and multicopy lac-basedcloning vectors for protein and operon fusions. Gene 53, 85-96 (1987)),in frame with the 9^(th) codon of lacZ (β-galactosidase). Mutantconstructs were generated by a three-step PCR strategy wherein regionsupstream and down stream of the mutation site were amplified separatelywith the appropriate DNA primers that introduced the desired sequencechanges. The resulting fragments were purified by agarose gelelectrophoresis, and then combined and amplified by PCR using primersthat correspond to the ends of the full-length construct. The resultingconstructs were cloned and sequenced. Constructs whose sequence wasconfirmed were used for expression analysis and were used as templatesfor subsequent preparation of PCR-derived DNAs for in vitrotranscription.

The in-frame fusions between various btuB leader sequences and lacZgenerated as described above were used to determine the levels ofexpression by employing a/3-galactosidase assay adapted from thatdescribed by Miller, In: A Short Course in Bacterial Genetics (ColdSpring Harbor Laboratory Press, Cold Spring Harbor, N.Y.,) p. 72 (1992).

2. Results

Metabolite-dependent conformational changes in the 202-nucleotide leadersequence of the btuB mRNA. FIG. 1A: Separation of spontaneousRNA-cleavage products of the btuB leader using denaturing 10%polyacrylamide gel electrophoresis (PAGE). 5′-32P-labeled mRNA leadermolecules (arrow) were incubated for 41 hr at 25° C. in 20 mM MgCl₂, 50mM Tris-HCl (pH 8.3 at 25° C.) in the presence (+) or absence (−) of 20μM of AdoCbl. Lanes containing RNAs that have undergone no reaction,partial digest with alkali, and partial digest with RNase T1 (G-specificcleavage) are identified by NR, ⁻OH, and T1, respectively. The locationof product bands corresponding to cleavage after selected guanosineresidues are identified by filled arrowheads. Arrowheads labeled 1through 8 identify eight of the nine locations that exhibiteffector-induced structure modulation, which experience an increase ordecrease in the rate of spontaneous RNA cleavage. The image wasgenerated using a phosphorimager (Molecular Dynamics), and cleavageyields were quantitated by using ImageQuant software. FIG. 1B: Sequenceand secondary-structure model for the 202-nucleotide leader sequence ofbtuB mRNA in the presence of AdoCbl. Putative base-paired elements aredesignated P1 through P9. Complementary nucleotides in the loops of P4and P9 that have the potential to form a pseudoknot are juxtaposed. Ninespecific sites of structure modulation are identified by light bluearrowheads. The asterisks demark the boundaries of the B₁₂ box(nucleotides 141-162). The coding region and the 38 nucleotides thatreside immediately 5′ of the start codon (nucleotides 241-243) were notincluded in the 202-nucleotide fragment. The 315-nucleotide fragmentincludes the 202-nucleotide fragment, the remaining 38 nucleotides ofthe leader sequence, and the first 75 nucleotides of the coding region.

The btuB mRNA leader forms a saturable binding site for AdoCbl. FIG. 2A:The dependence of spontaneous cleavage of btuB mRNA leader on theconcentration of AdoCbl effector as represented by site 1 (G23) and site2 (U68). 5′-³²P-labeled mRNA leader molecules were incubated, separated,and analyzed as described in the in the legend to FIG. 1A, and includeidentical control and marker lanes as indicated. Incubations containedconcentrations of AdoCbl ranging from 10 nM to 100 μM (lanes 1 though 8)or did not include AdoCbl (−). FIG. 2B: Composite plot of the fractionof RNA cleaved at six locations along the mRNA leader versus thelogarithm of the concentration (c) of AdoCbl. Fraction cleaved valueswere normalized relative to the highest and lowest cleavage valuesmeasured for each location, including the values obtained uponincubation in the absence of AdoCbl. The inset defines the symbols usedfor each of six sites, while the remaining three sites were excludedfrom the analysis due to weak or obscured cleavage bands. Filled andopen symbols represent increasing and decreasing cleavage yields,respectively, upon increasing the concentration of AdoCbl. The dashedline reflects a K_(D) of ˜300 nM, as predicted by the concentrationneeded to generate half-maximal structural modulation. Data plotted werederived from a single PAGE analysis, of which two representativesections are depicted in FIG. 2A.

The 202-nucleotide mRNA leader causes an unequal distribution of AdoCblin an equilibrium dialysis apparatus. FIG. 3(I): Equilibration oftritiated effector was conducted in the absence of RNA. FIG. 3(II):(step 1) Equilibration was conducted as in I, but with 200 pmoles ofmRNA leader added to chamber b; (step 2) 5,000 pmoles of unlabeledAdoCbl was added to chamber b. FIG. 3(III): Equilibrations wereconducted as described in II, but wherein 5,000 pmoles of cyanocobalaminwas added to chamber b. IV: (step 1) Equilibration was initiated asdescribed in step 1 of II; (steps 2 and 3) the solution in chamber a wasreplaced with 25 μL of fresh equilibration buffer; (step 4) 5,000 pmolesof unlabeled AdoCbl was added to chamber b. The cpm ratio is the ratioof counts detected in chamber b relative to that of a. The dashed linerepresents a cpm ratio of 1, which is expected if equal distribution oftritium is established.

Selective molecular recognition of effectors by the btuB mRNA leader.FIG. 4A shows a chemical structure of AdoCbl (1) and various effectoranalogs (2 through 11). FIG. 4B: Determination of analog binding bymonitoring modulation of spontaneous cleavage of the 202-nucleotide btuBRNA leader. 5′-³²P-labeled mRNA leader molecules were incubated,separated, and analyzed as described in the legend to FIG. 1A, andinclude identical control and marker lanes as indicated. The sections ofthree PAGE analyses encompassing site 2 (U68) are depicted. Below eachimage is plotted the amount of RNA cleaved (normalized with relation tothe lowest and highest levels of cleavage at U68 in each gel) for eacheffector as indicated, or for no effector (−). The compound 11(13-epi-AdoCbl) is an epimer of AdoCbl wherein the configuration at C13is inverted, so that the e propionamide side chain is above the plane ofthe corrin ring; see Brown et al., Conformational studies of5′-deoxyadenosyl-13-epicobalamin, a coenzymatically active structuralanalog of coenzyme B₁₂. Polyhedron 17, 2213 (1998).

Mutations in the mRNA leader and their effects on AdoCbl binding andgenetic control. FIG. 5A: Sequence of the putative P5 element of thewild-type 202-nucleotide btuB leader exhibits AdoCbl-dependentmodulation of structure as indicated by the observed increase inspontaneous RNA cleavage at position U68 (10% denaturing PAGE gel).Assays were conducted in the absence (−) or presence (+) of 5 μM AdoCbl.The remaining lanes are as described in the legend to FIG. 1A. Thecomposite bar graph reflects the ability of the RNA to shift theequilibrium of AdoCbl in an equilibrium dialysis apparatus and theability of a reporter gene (see Experimental Procedures) to be regulatedby AdoCbl addition to a bacterial culture. (Left) Plotted is the cpmratio derived by equilibrium dialysis, wherein chamber b contains theRNA. Details of the equilibrium dialysis experiments are described inthe brief description of FIG. 3. (Right) Plotted are the expressionlevels of β-galactosidase as determined from cells grown in the absence(−) or presence (+) of 5 μM AdoCbl. Boxed numbers on the left and right,respectively, reflect the approximate K_(D) and the fold repression ofβ-galactosidase activity in the presence of AdoCbl. N.D. designates notdetermined. FIGS. 5B-5F: Sequences and performance characteristics ofvarious mutant leader sequences as indicated. Constructs were created asdescribed in the Experimental Procedures section.

i. Metabolite-Induced Structure Modulation of a Messenger RNA.

To assess whether the btuB leader sequence alone is sufficient forsensing and responding to a metabolite, a molecular probing strategy wasemployed that relies on the structure-dependent spontaneous cleavage ofRNA (Soukup and Breaker, Relationship between internucleotide linkagegeometry and the stability of P, -NA. RNA 5, 1308-1325 (1999); Soukup etal., Generating new ligand-binding RNAs by affinity maturation anddisintegration of allosteric ribozymes. RNA 7, 524-536 (2001)). Theprincipal mechanism by which an RNA phosphodiester linkage isspontaneously cleaved involves an internal nucleophilic attack by the2′-oxygen on the adjacent phosphorus center. Since the precise “in-line”positioning of the U-oxygen, phosphorus, and 5′-oxygen atoms of a givenRNA linkage is essential for a productive nucleophilic attack to occur(Soukup and Breaker, Relationship between internucleotide linkagegeometry and the stability of P, -NA. RNA 5, 1308-1325 (1999); Soukup etal., Generating new ligand-binding RNAs by affinity maturation anddisintegration of allosteric ribozymes. RNA 7, 524-536 (2001);Westheimer, Pseudo-rotation in the hydrolysis of phosphate esters. Acc.Chem. Res. 1, 70-78 (1968); Usher, On the mechanism of ribonucleaseaction. Proc. Natl. Acad. USA 62, 661-667 (1969); Usher and McHale,Hydrolytic stability of helical RNA: a selective advantage for thenatural 3′,5′-bond. Proc. Natl. Acad. USA 73, 1149-1153 (1976);Dock-Bregeon and Moras, Conformational changes and dynamics of tRNAs:evidence from hydrolysis patterns. Cold Spring Harbor Symp. Quant. Biol.52, 113-121 (1987)), the rate at which spontaneous cleavage occurs at agiven linkage is highly dependent upon the secondary and tertiarystructure of the RNA. Specifically, RNA linkages that are formed bynucleotides involved in stable base-paired structures rarely undergospontaneous cleavage because they rarely adopt an in-line conformation,while nucleotides located in relatively unstructured regions or intertiary-structured regions experience far greater levels of spontaneouscleavage. Thus, probing of an RNA receptor in the absence and presenceof its ligand can be used to provide evidence for RNA structural modelsand even to determine the dissociation constant for a given RNA-ligandinteraction (Soukup and Breaker, Relationship between internucleotidelinkage geometry and the stability of P, -NA. RNA 5, 1308-1325 (1999);Soukup et al., Generating new ligand-binding RNAs by affinity maturationand disintegration of allosteric ribozymes. RNA 7, 524-536 (2001)).

A preparation of RNAs that encompass nucleotides 1 through 202 of the5′-untranslated region of the btuB mRNA (Nou and Kadner,Adenosylcobalamin inhibits ribosome binding to btuB RNA. Proc. Natl.Acad. Sci. USA 97, 7190-7195 (2000); Lundrigan et al., Transcribedsequences of the Escherichia coli btuB gene control its expression andregulation by vitamin B₁₂ Proc. Natl. Acad. USA 88, 1479-1483 (1991))was subjected to in-line probing (FIG. 1). In the absence of theputative AdoCbl effector, the RNA exhibits a distinct pattern ofcleavage products that is indicative of a well ordered conformationalstate, which has a mixture of stable structural elements interspersedwith regions that are mostly unstructured (FIG. 1A). In the presence ofAdoCbl, the pattern of cleavage changes at eight locations, while aninth position of structural modulation (FIG. 1B) is observed when alonger portion of the mRNA is used. Specifically, metabolite-inducedstructural modulation at nucleotide 202 (FIG. 1B, position 9) wasobserved by using in-line probing of a fragment that encompassesnucleotides 1 through 315 of the btuB mRNA (Nou and Kadner,Adenosylcobalamin inhibits ribosome binding to btuB RNA. Proc. Natl.Acad. Sci. USA 97, 7190-7195 (2000)). Positions 1, 3, 4, 8, and 9undergo an effector-dependent dampening of spontaneous cleavage whilethe remaining sites experience the reverse effect. A similar pattern ofmetabolite-modulated RNA cleavage was observed with the analogous206-nucleotide btuB leader RNA of S. typhimurium (Wei et al., Res.Microbiol. 143, 459 (1992)).

These effector-modulated sites are mapped on a secondary-structure modelthat was generated by using a combination of computational and RNAprobing data. An RNA secondary-structure prediction algorithm (Zuker etal., Algorithms and thermodynamics for RNA secondary structureprediction: a practical guide. In RNA Biochemistry and Biotechnology(eds. Barciszewski, J., and Clark, B. F. C.) pp. 11-43 (NATO ASI Series,Kluwer Academic Publishers) (1999)) supports a model wherein ninebase-paired elements are formed. The in-line probing data andpreliminary mutational analyses are consistent with eight of thesepairing interactions (P1-P4 and P6-P9), while an alternative pairinginteraction (P5) is supported (see below). The majority of theseputative base-paired elements appear to remain intact uponeffector-induced modulation, with the notable exception of P9. Theimportance of this structural element in the modulation of ribosomebinding and translation has been previously established by mutationalanalysis (Nou and Kadner, Adenosylcobalamin inhibits ribosome binding tobtuB RNA. Proc. Natl. Acad. Sci. USA 97, 7190-7195 (2000)).Metabolite-dependent formation of the P9 stem-loop structure appears tobe critical for the down-regulation of mRNA translation. Consistent withthis hypothesis is the observed increase in structure formation in thislocation upon the addition of AdoCbl (FIG. 1B, decreased cleavage atpositions 8 and 9).

ii. A Saturable Metabolite-Binding Site is Formed by a Messenger RNA.

If the structural alteration of the mRNA leader is induced selectivelyby AdoCbl (as opposed to modulation by a non-specific effect) then theRNA should exhibit characteristics of a typical receptor-ligandinteraction. Thus, a plot of the relative extents of structuralmodulation at each site is expected to yield an apparent dissociationconstant (apparent KD) for the effector, which reflects theconcentration of effector needed to convert half of the RNAs into theiraltered structural state. Furthermore, if a single binding event bringsabout the global structural changes that are observed, then theindividual Kr) values calculated for each modulation site shouldconverge on a single value, while these values are likely to vary if thestructural modulation results from non-specific effects.

Indeed, the levels of spontaneous RNA cleavage were found to correlatewith the concentrations of AdoCbl added to the in-line probing mixtures(FIG. 2A). Examination of the dependency of the six most prominent sitesof modulation on effector concentration reveals similar apparent K_(D)values of approximately 300 nM at 25° C. (FIG. 2B). This value iscomparable to an apparent K_(D) value derived from a previous assay thatexamined the AdoCbl-dependent binding of ribosomes to the btuB mRNA (Nouand Kadner, Adenosylcobalamin inhibits ribosome binding to btuB RNA.Proc. Natl. Acad. Sci. USA 97, 7190-7195 (2000)). Moreover, the factthat structural modulation occurs over a broad range of concentrationsof AdoCbl suggests that this RNA is not likely to make use ofcooperative binding of multiple effectors, which would result in a moresubstantial response to small changes in effector concentration.Together, these observations indicate that the mRNA leader undergoes asubstantial change in conformation and forms a high-affinity bindingpocket for AdoCbl.

To provide further support for this conclusion, equilibrium dialysis wasused to determine whether the RNA could selectively generate an unequaldistribution of tritiated AdoCbl (3H-AdoCbl) when incubated in atwo-chamber dialysis system. As expected, addition of 3H-AdoCbl tochamber a of an equilibrium dialysis assembly results in near equaldistribution of tritium (cpm ratio ˜1) between chambers a and b uponincubation (FIG. 3, experiment I). However, the addition of the202-nucleotide mRNA leader to chamber b causes a shift in theequilibrium of 3H-AdoCbl (cpm ratio ˜2) in favor of chamber b (FIG. 3,experiments II and III). Importantly, the subsequent addition of anexcess of unlabeled AdoCbl restores equal distribution of tritiumbetween the two chambers, while the addition of an excess ofcyanocobalamin (vitamin B₁₂, an analog of AdoCbl) does not restore theratio of tritium to unity. Excess unlabeled AdoCbl is expected torestore equal distribution by serving to occupy the vast majority of thebinding sites formed by the btuB RNA. In contrast, cyanocobalamin isknown to be incapable of serving as a regulatory effector for btuBexpression in E. coli (Nou and Kadner, Adenosylcobalamin inhibitsribosome binding to btuB RNA. Proc. Natl. Acad. Sci. USA 97, 7190-7195(2000); Lundrigan and Kadner, Altered cobalamin metabolism inEscherichia coli btuR mutants affects btuB gene regulation. J.Bacteriol. 171, 154-161 (1989)), and thus should be ignored as aneffector by the RNA. These findings are consistent with the conclusionthat the RNA directly binds AdoCbl and indicate that the RNA forms aselective binding pocket that excludes certain analog compounds.

Assuming that a 1:1 complex is formed between effector and RNA, it wasexpected that equilibrium dialysis would produce a cpm ratio of fargreater than 2 under the assay conditions (2-fold excess RNA over3H-AdoCbl and concentrations of RNA and effector in excess of theapparent KD). Since there should be an excess of binding sites, themajority of the tritium should be shifted to chamber b uponequilibration. However, the data suggest that −70% of the tritium in thesample used is not in the form of 3H-AdoCbl. For example, successivereplacement of the buffer in chamber a (which removes unshifted tritiumfrom the equilibrium dialysis system) results in increasing values forthe cpm ratio (FIG. 3; experiment IV). In addition, the tritium thatremains in chamber a upon equilibration with RNA in chamber b cannot beinduced to yield an unequal distribution of tritium by btuB RNA in asubsequent equilibrium dialysis experiment (data not shown). The sourceof this unbound tritium is most likely from light-mediated degradationof AdoCbl, which is highly unstable under ambient light conditions. Massspectrum analysis of 3H-AdoCbl reveals that the sample is almostentirely intact in the absence of light exposure, but yields −70%degradation upon exposure to light for a time of about 20 sec) that istypically experienced by a sample when establishing an equilibriumdialysis experiment.

iii. The btuB mRNA Leader Selectively Binds AdoCbl.

To-provide selectivity for the genetic response, the btuB mRNA leadermust form a precise binding pocket for AdoCbl in order to preclude thegenetic switch from being triggered by other metabolites. To explore themolecular recognition capabilities of this RNA, the binding affinity ofAdoCbl relative to 10 analogs was indirectly determined (FIG. 4A). Thiswas achieved by determining the extent of spontaneous cleavage at site 2(nucleotide U68) upon incubation in the presence of AdoCbl or of variousanalogs (FIG. 4B). It was found that the RNA fails to undergo structuralmodulation when cobalamin compounds lack the 5′-deoxy-5′-adenosylmoiety. The importance of individual functional groups on this moiety isrevealed by the function of other analogs. In summary, modifications atthe N1, N3, and N6 positions of the adenine ring cause significantdisruption of binding, while the 2″-hydroxyl group of the adjoiningribose moiety is not an important molecular recognition element.Interestingly, a change in the stereochemistry at position 13 of thecorrin ring (compound 11) renders the molecule inactive as a regulatoryeffector in this in vitro assay and also inside cells. These findingsindicate that the btuB mRNA leader forms a binding pocket for AdoCbl andthat the RNA makes numerous contacts with the effector to ensure highmolecular specificity.

iv. Disruption of Metabolite-RNA Binding has Consequences for GeneticControl.

The presence of AdoCbl causes reductions in ribosome binding andtranslation efficiency of the btuB mRNA (Nou and Kadner,Adenosylcobalamin inhibits ribosome binding to btuB RNA. Proc. Natl.Acad. Sci. USA 97, 7190-7195 (2000)). The results indicate that thisgenetic control process is mediated by the selective binding of AdoCblto the btuB mRNA. The effector-binding function of mutant RNA leaders invitro was compared with their ability to support effector-inducedgenetic control inside cells. As expected, the wild-type mRNA leaderexhibits effector-induced structure modulation, induces an unequaldistribution of ³H-AdoCbl in an equilibrium dialysis system, and permitsdown regulation of a reporter gene in E. coli cells treated with AdoCbland harboring the appropriate reporter construct (summarized in FIG.5A). However, the introduction of a single mutation (A150T) in theevolutionarily conserved “B₁₂ box” (Nou and Kadner, Adenosylcobalamininhibits ribosome binding to btuB RNA. Proc. Natl. Acad. Sci. USA 97,7190-7195 (2000)) completely eliminates the in vitro effector-bindingand in vivo gene-control functions of this construct, termed “ml” (FIG.5B), which is consistent with the necessity of effector binding forgenetic control.

Mutations that disrupt (U73G, G74U) and subsequently restore (U73G,G74U, C114A, A115C) the predicted P5 stem element were examined. Thedisrupted stem in construct m2 causes a reduction of AdoCbl bindingaffinity in vitro and a corresponding reduction of genetic control incell assays (FIG. 5C), while restoration of the P5 stem element(construct m3) results in near wild-type functions for binding andgenetic control (FIG. 5D). This indicates that the P5 stem is animportant structural element for function of the RNA. Interestingly,potentially disruptive (m4) and restorative (m5) mutations in a possiblepseudoknot structure between the P4 and P9 loops (FIG. 1B) both resultin a reduction in binding affinity (K_(D) ˜5 μM). If a pseudoknot isbeing formed, this structure might require a specific sequence forproper function. Although these RNAs maintain diminished but detectablelevels of effector binding, neither exhibits genetic control upon theaddition of AdoCbl to bacterial cultures harboring the correspondingreporter constructs. The loss in binding affinity likely is sufficientto place these mutant RNAs out of the physiological range for effectorconcentration, as the cells still retain their natural btuB gene whoseregulatory system continues to control the import of AdoCbl. Thefindings support the hypothesis that mRNAs have the structural andfunctional sophistication needed to perform precision genetic control inthe absence of protein regulatory elements.

v. Analysis

Genetic control by mRNAs that directly sense the concentrations ofmetabolites is a newly established paradigm for monitoring the status ofcellular metabolism. Although sensing of aminoacyl tRNAs in prokaryotesalso appears to be achieved by direct binding of tRNAs to the5′-untranslated region of their corresponding aminoacyl tRNA synthetases(Henkin, tRNA-directed transcription antitermination. Mol. Microbiol. 3,381-387 (1994)), binding appears to be mediated by Watson/Crick basepairing. In the case of btuB the mRNA directly binds the Ado-Cbleffector and becomes resistant to translation initiation, presumably bypreventing ribosome binding (Nou and Kadner, Adenosylcobalamin inhibitsribosome binding to btuB RNA. Proc. Natl. Acad. Sci. USA 97, 7190-7195(2000)). If no protein receptors are required for molecular recognitionor for modulating gene expression, then this simple “riboswitch”mechanism is most economical in its architecture. Given theorganizational simplicity of the btuB genetic control componentscompared to analogous systems that involve proteins, it is likely thatmRNAs could be more easily engineered to respond directly to natural andnon-biological regulatory effectors.

It is possible that variations of this mechanism involving directcontacts between metabolite and mRNA are far more widespread in geneticcircuitry. For example, the S. typhimurium cob operon, which encodesproteins in the biosynthetic pathway for the AdoCbl coenzyme, carriesB₁₂ box and other regulatory structures in its leader domain (Ravnum andAndersson, An adenosyl-cobalamin (coenzyme-B₁₂)-repressed translationalenhancer in the cob mRNA of Salmonella typhimurium. Mol. Microbiol. 39,1585-1594 (2001)). It has been noted (White III, Coenzymes as fossils ofan earlier metabolic state. J. Mol. Evol. 7, 101-104 (1976)) that thesetwo coenzymes and FMN, which is another potential riboswitch effector(Gelfand et al., A conserved RNA structure element involved in theregulation of bacterial riboflavin synthesis genes. Trends Genetics 15,439-442 (1999)), possibly are molecular fossils of an ancient metabolicstate that was run entirely by RNA. If true, then mechanisms involvingmetabolite sensing by mRNA might be one of the oldest forms of geneticcontrol in existence.

B. Example 2 Thiamine Pyrophosphate (TTP) Riboswitches

The example described testing and analysis of a riboswitch that controlsgene expression by binding thiamine pyrophosphate.

1. Chemicals and Oligonucleotides

TPP, thiamine monophosphate (TP), thiamine, oxythiamine, amprolium, andbenfotiamine were purchased from Sigma. Thiamine disulfide and4-methyl-5-β-hydroxyethylthiazole (THZ) were purchased from TCI America.³H-labeled thiamine was purchased from American Radiolabeled Chemicals,Inc. (10 Ci mmol⁻¹). Synthetic DNAs were synthesized by the KeckFoundation Biotechnology Resource Center at Yale University. DNAs werepurified by denaturing (8 M urea) polyacrylamide gel electrophoresis(PAGE) and isolated from the gel by crush-soaking in 10 mM Tris-HCl (pH7.5 at 23° C.), 200 mM NaCl and 1 mM EDTA. The DNA was recovered byprecipitation with ethanol.

2. Construction of E. coli thiM- and E. coli thiC-lacZ Fusions

Nucleotides −83 to 238 of the E. coli thiCEFGH operon (Vander Horn etal., Structural genes for thiamine biosynthetic enzymes (thiCEFGH) inEcherichia coli K-12. J. Bacteriology 175, 982-992 (1993)), wasamplified by PCR from E. coli strain MC4100 (obtained from S. Gottesman,NIH) as a EcoR1-Bgl II fragment. The DNA was ligated into EcoR1- andBamH1-digested pRS414 plasmid DNA, which contains a promoterless copy oflacZ (obtained from R. Simons, UCLA; Simons et al., Improved single andmulticopy lac-based cloning vectors for protein and operon fusions Gene53, 85-96 (1987)), resulting in the in-frame fusion of the 9^(th) codonof lacZ to the 9^(th) codon of thiC. Similarly, the regulatory region ofthiM (nucleotides −67 to 163) was amplified by PCR as a EcoR1-BamH1fragment and inserted into plasmid pRS414, wherein the 6^(th) codon ofthiM resides in-frame with the 9^(th) codon of lacZ. The plasmids weretransformed into Top10 cells (Invitrogen) for all subsequentmanipulations. All site-directed mutations were introduced into the thiCand thiM regulatory regions using the QuikChange site-directedmutagenesis kit (Stratagene) and the appropriate mutagenic DNA primers.All mutations were confirmed by DNA sequencing (USB Thermosequenase).

3. Thiamine-Repression β-Galactosidase Assays

E. coli cells (Top10; Invitrogen) that contained in-frame lacZ fusionsto thiC or thiM mRNA leader sequences, were grown in M9 glucose minimalmedia (plus 50 μg/ml Vitamin assay Casamino acids; Difco) tomid-exponential phase. The cultures were either grown with or withoutadded thiamine (100 μM). Aliquots (1 mL) were removed forβ-galactosidase enzyme assays, which were conducted in a manner similarto that described by Miller (Miller, In: A Short Course in BacterialGenetics Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.,p. 72. (1992)). All assays were repeated twice and in duplicate, withMiller unit values reflecting the average of these analyses.

4. In Vitro Transcription

Templates for in vitro transcription of the fragments of thiC and thiMmRNA leaders were generated by PCR using the appropriate DNA primers andplasmids pRS414thiC or pRS414thiM, respectively. The dinucleotidesequence GG was introduced into the DNA constructs (corresponding to the5′ terminus of each RNA construct) at this step to facilitatetranscription by T7 RNA polymerase. RNAs were prepared by in vitrotranscription and were 5′ ³²P-labeled as described previously(Seetharaman et al., Immobilized riboswitches for the analysis ofcomplex chemical and biological mixtures. Nature Biotechnol. 19, 336-341(2001)).

5. In-Line Probing of RNA

Determination of apparent K_(D) values for each construct was achievedby conducting in-line probing of RNA constructs wherein theconcentration of the ligand was varied between 10 nM and 100 μM, or upto 10 mM for weakly binding ligands. Specifically, TPP-dependentmodulation of the spontaneous cleavage of RNA constructs was visualizedby polyacrylamide gel electrophoresis (PAGE). 5′ ³²P-labeled RNAs (20nM) were incubated for approximately 40 hr at 25° C. in 20 mM MgCl₂, 50mM Tris-HCl (pH 8.3 at 25° C.) in the presence (+) or absence (−) of 100μM TPP. Some RNAs were subjected to no reaction, partial digestion withalkali, or partial digestion with RNase T1 (G-specific cleavage) (seeFIG. 6 a). Composite plots of the fraction of RNA cleaved at specificsites versus the logarithm of the concentration of ligand (e.g. FIG. 7a) were generated to provide an estimate of the apparent K_(D). Fractioncleaved values were normalized relative to the highest and lowestcleavage values measured for each site.

6. Equilibrium Dialysis

Equilibrium dialysis assays were conducted using a DispoEquilibriumDialyzer (ED-1, Harvard Bioscience), wherein chambers a and b wereseparated by a 5,000 Dalton molecular weight cut-off membrane.Equilibration was initiated by the addition of 25 μL of equilibrationbuffer [50 mM Tris-HCl (pH 8.3 at 25° C.), 20 mM MgCl₂, 100 mM KCl],containing 100 nM ³H-thiamine and by the addition of an equal volume ofequilibration buffer either without or with 20 μM RNA as indicated tochamber b. Equilibrations were allowed to proceed for 10 hr at 23° C.,and aliquots were removed from each chamber and quantitated by using aliquid scintillation counter.

7. Results

i. Metabolite Binding by mRNAs.

FIG. 6A shows TPP-dependent modulation of the spontaneous cleavage of165 thiM RNA was visualized by polyacrylamide gel electrophoresis(PAGE). 5′ ³²P-labeled RNAs (arrow, 20 nM) were incubated forapproximately 40 hr at 25° C. in 20 mM MgCl₂, 50 mM Tris-HCl (pH 8.3 at25° C.) in the presence (+) or absence (−) of 100 μM TPP. NR, ⁻OH and T1represent RNAs subjected to no reaction, partial digestion with alkali,or partial digestion with RNase T1 (G-specific cleavage), respectively.Product bands representing cleavage after selected G residues arenumbered and identified by filled arrowheads. The asterisk identifiesmodulation of RNA structure involving the Shine-Dalgarno (SD) sequence.Gel separations were analyzed using a phosphorimager (MolecularDynamics) and quantitated using ImageQuant software.

FIG. 6B shows a secondary-structure model of 165 thiM as predicted bycomputer modeling (Zuker et al., Algorithms and thermodynamics for RNAsecondary structure prediction: a practical guide. In RNA Biochemistryand Biotechnology (eds. Barciszewski J. & Clark, B. F. C.) 11-43 (NATOASI Series, Kluwer Academic Publishers, 1999); Mathews et al., Expandedsequence dependence of thermodynamic parameters improves prediction ofRNA secondary structure. J. Mol. Biol. 288, 911-940 (1999)) and by thestructure probing data depicted in FIG. 6A. Spontaneous cleavagecharacteristics are as noted in the inset. Unmarked nucleotides exhibita constant but low level of degradation. The truncated 91 thiM RNA isboxed and the thi box element (Miranda-Rios et al., A conserved RNAstructure (thi box) is involved in regulation of thiamin biosyntheticgene expression in bacteria. Proc. Natl. Acad. Sci. USA 98, 9736-9741(2001)) is shaded. Nucleotides enclosed in boxes identify an alternativepairing, designated P8*. The RNA carries two mutations (G156A and U157C)relative to wild type that were introduced in a non-essential portion ofthe construct to form a restriction site for cloning, while all RNAscarry two 5′-terminal G residues to facilitate in vitro transcription.

FIG. 6C shows TPP-dependent modulation of the spontaneous cleavage of240 thiC RNA. Reactions were conducted and analyzed as described inabove for FIG. 6A. FIG. 6D shows a secondary-structure model of 240thiC. Base-paired elements that are similar to those of thiM are labeledP1 through P5. The truncated RNA 111 thiC is boxed. Nucleotides enclosedin boxes identify an alternative pairing.

ii. The thiM and thiC mRNA Leaders Serve as High-Affinity MetaboliteReceptors.

FIG. 7A shows the extent of spontaneous modulation of RNA cleavage atseveral sites within 165 thiM (left) and 240 thiC (right) plotted fordifferent concentrations (c) of TPP. Arrows reflect the estimatedconcentration of TPP needed to attain half maximal modulation of RNA(apparent K_(D)). FIG. 7B shows the logarithm of the apparent K_(D)values plotted for both RNAs with TPP, TP and thiamine as indicated. Theboxed data was generated using TPP with the truncated RNAs 91 thiM and111 thiC. FIG. 7C shows that patterns of spontaneous cleavage of 165thiM differ between thiamine and TPP ligands as depicted by PAGEanalysis (left) and as reflected by graphs (right) representing therelative phosphorimager counts for the three lanes as indicated. Detailsfor the RNA probing analysis are similar to those described above inconnection with FIG. 6A. The graphs were generated by ImageQuantsoftware.

iii. High Sensitivity and Selectivity of mRNA Leaders for MetaboliteBinding.

FIG. 8A shows chemical structures of several analogues of thiamine. TDis thiamine disulfide and THZ is 4-methyl-5-β-hydroxyethylthiazole. FIG.8B shows PAGE analysis of 165 thiM RNA structure probing using TPP andvarious chemical analogues (40 μM each) as indicated. Locations ofsignificant structural modulation within the RNA spanning nucleotides˜113 to ˜150 are indicated by open arrowheads. The asterisk identifiesthe site (C144) used to compare the normalized fraction of RNA that iscleaved (bottom) in the presence of specific compounds. Details for theRNA probing analysis are similar to those described above in connectionwith FIG. 6A. FIG. 8C shows a summary of the features of TPP that arecritical for molecular recognition. FIG. 8D shows equilibrium dialysisusing ³H-thiamine as a tracer. Plotted are the ratios for tritiumdistribution in a two-chamber system (a and b) that were establishedupon equilibration in the presence of the RNA constructs in chamber b asindicated (see below for a description of the non-TPP-binding mutantM3). 100 μM TPP or oxythiamine were added to chamber a, as denoted, uponthe start of equilibration.

iv. Mutational Analysis of the Structure and Function of the thiMRiboswitch.

FIG. 9A shows mutations present in constructs M1 through M8 relative tothe 165 thiM RNA. P8* is a putative base-paired element between portions(shaded) of the P1 and P8 stems. FIGS. 9B and 9C show in vitroligand-binding and genetic control functions of the wild-type (WT), M1and M2 RNAs as reflected by PAGE analysis of in-line probing experiments(10 μM TPP) and by β-galactosidase expression assays. Labels on PAGEgels are as described above in connection with FIG. 6A. Bars representthe levels of gene expression in the presence (+) and the absence (−) ofTPP in the culture medium. FIG. 9D is a summary of similar analyses ofWT through M9 is presented in table form. The SD status “n.d.” (notdetermined) indicates either that the level of spontaneous cleavagedetected in the absence and presence of TPP is near the limit ofdetection (M6, M7 and M8) or that the region adopts an atypicalstructure (M9) compared to WT.

8. Discussion

β-galactosidase fusion constructs were prepared that encompass the5″-untranslated region of thiM and thiC mRNAs of E. coli, which includesa previously identified “thi box” domain whose sequence and potentialsecondary structure are conserved in several species of bacteria andarchaea (Miranda-Rios et al., A conserved RNA structure (thi box) isinvolved in regulation of thiamin biosynthetic gene expression inbacteria. Proc. Natl. Acad. Sci. USA 98, 9736-9741 (2001)). The thiM andthiC translational fusion constructs exhibit thiamine-dependentsuppression of β-galactosidase activity of 18- and 110-fold,respectively, when host cells are grown in a minimal medium thatotherwise lacks a source of thiamine. A transcriptional fusioncontaining the thiM leader is not subject to suppression by thiamine,but a similar fusion with thiC leader yields a 16-fold modulation withthiamine, suggesting that a significant portion of genetic controlobserved with thiC occurs at the level of transcription.

These constructs were used to prepare DNA templates by PCR for in vitrotranscription of RNA fragments. The resulting RNAs were subjected to astructure-probing process (see Example 1) to reveal whether the RNAsundergo structure modulation upon binding of ligands. Internucleotidelinkages in unstructured regions are more likely to undergo spontaneouscleavage compared to linkages that reside in highly structured regionsof an RNA (Soukup & Breaker, Relationship between internucleotidelinkage geometry and the stability of RNA. RNA 5, 1308-1325 (1999)). The165-nucleotide thiM RNA fragment (165 thiM) has a distinct pattern ofcleavage products that is generated when the RNA is incubated for anextended period in the absence of TPP (FIG. 6A). Upon addition of 100 μMTPP, 165 thiM undergoes substantial structural alteration as manyinternucleotide linkages in the region spanning positions 39 through 80exhibit a reduction in spontaneous cleavage. This indicates that TPPbinds to the RNA and stabilizes a defined structure within this region,resulting in a lower rate of fragmentation.

The fragmentation patterns are largely congruent with potentialbase-paired and bulge structures that are identified by asecondary-structure prediction algorithm (Zuker et al., Algorithms andthermodynamics for RNA secondary structure prediction: a practicalguide. In RNA Biochemistry and Biotechnology (eds. Barciszewski J. &Clark, B. F. C.) 11-43 (NATO ASI Series, Kluwer Academic Publishers,1999); Mathews et al., Expanded sequence dependence of thermodynamicparameters improves prediction of RNA secondary structure. J. Mol. Biol.288, 911-940 (1999)). Most linkages that experience a ligand-inducedreduction of cleavage are encompassed by the thi box and nucleotidesthat reside immediately 5′ relative to this domain (FIG. 6B). Otherlinkages that undergo cleavage, but that are not modulated by TPP, arepredicted to reside in bulges or in the loops of hairpins. Predictedbase-paired structures labeled P2 through P7 encompass linkages thatexhibit the lowest levels of spontaneous cleavage, implying that theyremain structured in both the presence and absence of TPP.Interestingly, nucleotides 126 through 130 encompass the only regionapart from those described above that become more structured upon TPPaddition. These nucleotides correspond to the Shine-Dalgarno (SD)sequence, which is required for efficient translation of mRNAs inprokaryotes. These findings are consistent with a genetic controlmechanism wherein the thiM RNA binds to TPP and forms a complex whereinthe ribosome cannot gain access to the SD sequence.

Similarly, structure probing was used to examine the mRNA leader forthiC. The 240 thiC RNA also exhibits extensive modulation of its patternof spontaneous cleavage, and again the majority of the changing patternis located in the thi box and in the region located immediately upstreamof this domain (FIG. 6C). These regions of highest structure modulationin thiM and thiC can be folded into similar secondary structures (FIG.6D), and carry several common sequence elements within and adjacent tothe thi box domain. Thus, the structures of thiM and thiC spanning stemsP1 through P5 comprise TPP-binding motifs that are analogous toaptamers, which are engineered ligand-binding RNAs (Osborne & Ellington,Nucleic acid selection and the challenge of combinatorial chemistry.Chem. Rev. 97, 349-370 (1997); Hermann & Patel, Adaptive recognition bynucleic acid aptamers. Science 287, 820-825 (2000); Gold et al.,Diversity of oligonucleotide functions. Annu. Rev. Biochem. 64, 763-797(1995)). Nucleotides residing 3′ relative to this natural TPP aptamerare involved in converting the metabolite binding event into a geneticresponse.

The sensitivity of metabolite detection by these mRNAs was assessed byestablishing apparent dissociation constant (apparent K_(D)) values forTPP, thiamine, and thiamine monophosphate (TP). Values were generated bymonitoring the extent of spontaneous cleavage at severalligand-sensitive sites within the RNA under a range of ligandconcentrations. For example, probing of a trace amount of 165 thiM RNAunder TPP concentrations ranging from zero to 100 μM (or up to 10 mMwith certain analogues) reveals that half-maximal modulation of RNAstructure occurs when approximately 600 nM TPP is present (FIG. 7A),which reflects an apparent K_(D) of 600 nM. Likewise, probing of 240thiC reveals an apparent K_(D) of 100 nM. Both 165 thiM and 240 thiCRNAs appear to bind TPP more avidly than TP or thiamine, with thiCexhibiting more than 1,000-fold discrimination against TP and thiamine(FIG. 7B). The fact that TPP is the strongest modulator of RNA structureis consistent with genetic observations in Salmonella typhimurium thatTPP synthesis is required for regulation of expression of thiaminebiosynthesis genes (Webb et al., Thiamine pyrophosphate (TPP) negativelyregulates transcription of some thi genes of Salmonella typhimurium. J.Bacteriol. 178, 2533-2538 (1996)). The differential specificity achievedby the RNAs, which is a phenomenon that is commonly observed forreceptor-ligand systems made of protein, indicates that theseligand-binding RNAs would be receptive to specificity changes (through,for example, natural or artificial evolutionary forces).

The actual K_(D) values for RNA-ligand interactions might be differentinside cells where physiological conditions of Mg²⁺ and other agentsthat can influence RNA structure will not match those of the in vitroassays. Also, the nature of the RNA construct can be a source of analtered K_(D). For example, the minimized 91 thiM construct (FIG. 6A),which largely encompasses only the putative natural aptamer, retains theability to bind TPP and exhibits an apparent K_(D) that is improved byapproximately 20 fold compared to the longer construct (FIG. 7B). Thus,the affinity for TPP might vary as the nascent RNA transcript emergesfrom the active site of RNA polymerase or the ribosome. Furthermore,this result demonstrates that the 91 thiM aptamer domain can beseparated from RNA components (collectively termed the “expressionplatform”) that are directly controlling gene expression. This modularconstruction, involving the physical and functional separation ofaptamer and expression platform domains allows the generation ofTPP-controlled RNAs by rational RNA engineering strategies (or throughevolutionary processes).

Spontaneous cleavage at several linkages within the thi box domain of165 thiM specifically correlate with the type of ligand used. AlthoughTPP reduces spontaneous cleavage of 165 thiM at nucleotides A61, U62 andto a smaller extent at U79, these same sites retain an elevated level ofcleavage when thiamine is present near its saturating concentration(FIG. 7C). These nucleotides cluster at an internal bulge within the thibox domain, and appear to contribute to the binding site for thephosphate groups of TPP.

The structural modulation of 165 thiM was further examined in thepresence of several analogues that carry certain structural features ofthiamine (FIG. 8A). Thiamine and its phosphorylated derivatives TP andTPP induce modulation as expected (FIG. 8B). However, oxythiamine andother thiamine analogues with less similarity to TPP fail to inducestructure modulation. The performance of this sampling of analoguesindicates that the RNA makes specific contacts to distal parts of itsligand and that both the purine and phosphate groups carry importantelements for molecular recognition (FIG. 8C). Similar results areobtained by using equilibrium dialysis assays (FIG. 8D). For example,the addition of 91 thiM RNA to chamber b of an equilibrium dialysisassembly causes a shift in the distribution of ³H-thiamine in favor ofchamber b, unless an excess of unlabeled TPP is also included. However,the presence of oxythiamine does not significantly restore the tritiumdistribution to unity, which is expected because probing data indicatethat it is not able to bind the RNA. These findings indicate that theaptamer domain of the TPP riboswitch is highly selective for its targetligand.

The secondary structure model for 165 thiM RNA was examined in greaterdetail by generating and testing a series of variant constructs (FIG.9A). For example, variant M1 carries a mutation that disrupts thepredicted P3 pairing element. This mutation causes a loss of TPP binding(FIG. 9B, e.g. see position C77) and a loss of genetic control of thecorresponding β-galactosidase fusion construct (FIG. 9C, graph).Re-establishment of base pairing in the double-mutant construct M2restores both TPP binding and genetic control. Similarly, disruptive andrestorative mutations encompassed by constructs M3 through M6 areconsistent with the formation of stems P5 and P8. Upon the addition ofTPP, the SD element of both the WT and M2 constructs becomes sequesteredin a structure that precludes a high level of spontaneous cleavage. Incontrast, the M1 construct does not exhibit SD modulation (FIGS. 9B and9C, nucleotides 126-130). These results are consistent with the geneticswitch being turned off by a mechanism whereby TPP binding ultimatelypromotes the stable formation of P8, which reduces access to the SD bythe ribosome.

The partner of the SD sequence in P8 (nucleotides 108 to 111) remainsresistant to spontaneous cleavage both in the presence and absence ofTPP (FIG. 6A). This is consistent with the formation of P8, uponaddition of TPP, due to the displacement of an alternative structurethat otherwise prevents this anti-SD element from forming P8.Furthermore, nucleotides 83 through 86 are complementary to the anti-SDelement and this region also resists spontaneous cleavage in thepresence and absence of TPP. A mechanism by which genetic control couldresult, which is tested as described below, is via the mutuallyexclusive formation of P8* in the ‘On’ state versus the simultaneousformation of P1 and P8 in the metabolite-bound ‘Off’ state (FIG. 9D).

Constructs M7 through M9 were tested in an assessment of this mechanism.Construct M7 carries a U109C mutation in the anti-SD sequence that isdesigned to destabilize the P8 interaction while simultaneouslydestabilizing the P8* interaction. M7 retains TPP binding function andexhibits a significant level of genetic modulation (FIG. 9C, box), whichis expected if the mutation does not disrupt the relative distributionof mRNAs between the ‘On’ and ‘Off’ states. In comparison, M8 (U110C)retains TPP binding, exhibits a dramatic reduction in the level ofreporter expression, and loses nearly all genetic modulation. Inaddition, M8 no longer exhibits detectable spontaneous cleavage in theSD sequence, which is consistent with the thermodynamic balance betweenP8 and P8* formation being shifted decidedly in favor of P8 in this RNAvariant. Construct M9, which carries four mutations in the anti-SDelement, has a significantly different pattern of spontaneous cleavagein the SD region. M9 fails to reduce gene expression upon thiamineaddition to cells, despite the fact that the construct retains TPPbinding in vitro. It is evident from these data that TPP bindingrestricts the structural freedom of the SD element in the appropriateRNA variants, and that this correlates with genetic control.

C. Example 3 Metabolite-Binding Riboswitches

1. Introduction

Modern organisms must coordinate the expression of many hundreds ofgenes in response to metabolic demands and environmental changes. Eachgene product must be regulated temporally, quantitatively, andoftentimes spatially. Additionally, genetic control processes must bedynamic, rapid, and selectively responsive to the specific conditionsundergoing change. Therefore, organisms require sentries of geneticregulatory factors that continuously quantify a multitude ofenvironmental signals. Upon measurement of a particular signal, whichmay be one of many possible biochemical or physical cues, theseregulatory factors must modulate expression of a specific subset of theorganism's genes.

It has generally been assumed that proteins are the obligate sensors ofthese signals because proteins are a proven medium for forming highlyresponsive sensors. However, it was discovered that mRNAs also arecapable of acting as direct sensors of chemical and physical conditionsfor the purpose of genetic control. Classes of mRNA domains,collectively referred to as ‘riboswitches’, serve as RNA genetic controlelements that sense the concentrations of specific metabolites bydirectly binding the target compound. Riboswitches that have beendiscovered are responsible for sensing metabolites that are critical forfundamental biochemical processes including adenosylcobalamin (AdoCbl)(see Example 1), thiamine pyrophosphate (TPP) (see Example 2), flavinmononucleotide (FMN), S-adenosylmethionine (SAM) (see Example 7), lysine(see Example 5), guanine (see Example 6), and adenine (see Example 8).Upon interaction with the appropriate small molecule ligand, riboswitchmRNAs undergo a structural reorganization that results in the modulationof genes that they encode. To date, all riboswitches that have beenexamined in detail cause genetic repression upon binding their targetligand, although riboswitches that activate gene expression upon ligandbinding can be produced (and will likely be found in nature).

In each instance, riboswitch domains have been subjected to a battery ofbiochemical and genetic analyses in order to convincingly demonstratethat direct interaction of small organic metabolites with mRNA receptorsleads to a corresponding alteration in genetic expression. This exampleprovides a brief summary of these efforts and of some of the generalcharacteristics that are exhibited by riboswitches. Using thesediscoveries and the principles of riboswitch operation described in thisexample and elsewhere herein, those of skill in the art can use andadapt riboswitches for many purposes including use as genetic tools andas targets for development of antimicrobials.

2. General Organization of Riboswitch RNAs

Bacterial riboswitch RNAs are genetic control elements that are locatedprimarily within the 5′-untranslated region (5′-UTR) of the main codingregion of a particular mRNA. Structural probing studies (discussedfurther below) revealed that riboswitch elements are generally composedof two domains: a natural aptamer (T. Hermann, D. J. Patel, Science2000, 287, 820; L. Gold, et al., Annual Review of Biochemistry 1995, 64,763) that serves as the ligand-binding domain (referred to herein as theaptamer domain), and an ‘expression platform’ that interfaces with RNAelements that are involved in gene expression (e.g. Shine-Dalgarno (SD)elements; transcription terminator stems). These conclusions are drawnfrom the observation that aptamer domains synthesized in vitro bind theappropriate ligand in the absence of the expression platform (seeExamples 2 and 6). Moreover, structural probing investigations suggestthat the aptamer domain of most riboswitches adopts a particularsecondary- and tertiary-structure fold when examined independently, thatis essentially identical to the aptamer structure when examined in thecontext of the entire 5′ leader RNA. This implies that, in many cases,the aptamer domain is a modular unit that folds independently of theexpression platform (see Examples 2 and 6).

Ultimately, the ligand-bound or unbound status of the aptamer domain isinterpreted through the expression platform, which is responsible forexerting an influence upon gene expression. The view of a riboswitch asa modular element is further supported by the fact that aptamer domainsare highly conserved amongst various organisms (and even betweenkingdoms as is observed for the TPP riboswitch, whereas the expressionplatform varies in sequence, structure, and in the mechanism by whichexpression of the appended open reading frame is controlled. Forexample, ligand binding to the TPP riboswitch of the tenA mRNA of B.subtilis causes transcription termination. This expression platform isdistinct in sequence and structure compared to the expression platformof the TPP riboswitch in the thiM mRNA from E. coli, wherein TPP bindingcauses inhibition of translation by a SD blocking mechanism (see Example2). The TPP aptamer domain is easily recognizable and of near identicalfunctional character between these two transcriptional units, but thegenetic control mechanisms and the expression platforms that carry themout are very different.

Aptamer domains for riboswitch RNAs typically range from ˜70 to 170 ntin length (FIG. 11). This observation was somewhat unexpected given thatin vitro evolution experiments identified a wide variety of smallmolecule-binding aptamers, which are considerably shorter in length andstructural intricacy (T. Hermann, D. J. Patel, Science 2000, 287, 820;L. Gold, et al., Annual Review of Biochemistry 1995, 64, 763; M.Famulok, Current Opinion in Structural Biology 1999, 9, 324). Thesubstantial increase in complexity and information content of thenatural aptamer sequences relative to artificial aptamers is most likelyrequired to form RNA receptors that function with high affinity andselectivity. Apparent K_(D) values for the ligand-riboswitch complexesrange from low nanomolar to low micromolar. It is also worth noting thatsome aptamer domains, when isolated from the appended expressionplatform, exhibit improved affinity for the target ligand over that ofthe intact riboswitch (˜10 to 100-fold) (see Example 2). This likelyrepresents an energetic cost in sampling the multiple distinct RNAconformations required by a fully intact riboswitch RNA, which isreflected by a loss in ligand affinity. Since the aptamer domain mustserve as a molecular switch, this might also add to the functionaldemands on natural aptamers that might help rationalize their moresophisticated structures.

3. Riboswitch Regulation of Transcription Termination in Bacteria

Bacteria primarily make use of two methods for termination oftranscription. Certain genes incorporate a termination signal that isdependent upon the Rho protein (J. P. Richardson, Biochimica etBiophysica Acta 2002, 1577, 251), while others make use ofRho-independent terminators (intrinsic terminators) to destabilize thetranscription elongation complex (I. Gusarov, E. Nudler, Molecular Cell1999, 3, 495; E. Nudler, M. E. Gottesman, Genes to Cells 2002, 7, 755).The latter RNA elements are composed of a GC-rich stem-loop followed bya stretch of 6-9 uridyl residues. Intrinsic terminators are widespreadthroughout bacterial genomes (F. Lillo, et al., Bioinformatics 2002, 18,971), and are typically located at the 3′-termini of genes or operons.Interestingly, an increasing number of examples are being observed forintrinsic terminators located within 5′-UTRs.

Amongst the wide variety of genetic regulatory strategies employed bybacteria there is a growing class of examples wherein RNA polymeraseresponds to a termination signal within the 5′-UTR in a regulatedfashion (T. M. Henkin, Current Opinion in Microbiology 2000, 3, 149).During certain conditions the RNA polymerase complex is directed byexternal signals either to perceive or to ignore the termination signal.Although transcription initiation might occur without regulation,control over mRNA synthesis (and of gene expression) is ultimatelydictated by regulation of the intrinsic terminator. Generally, one of atleast two mutually exclusive mRNA conformations results in the formationor disruption of the RNA structure that signals transcriptiontermination. A trans-acting factor, which in some instances is a RNA (F.J. Grundy, et al., Proceedings of the National Academy of Sciences ofthe United States of America 2002, 99, 11121; T. M. Henkin, C. Yanofsky,Bioessays 2002, 24, 700) and in others is a protein (J. Stulke, Archivesof Microbiology 2002, 177, 433), is generally required for receiving aparticular intracellular signal and subsequently stabilizing one of theRNA conformations. Riboswitches offer a direct link between RNAstructure modulation and the metabolite signals that are interpreted bythe genetic control machinery. A brief overview of the FMN riboswitchfrom a B. subtilis mRNA is provided below to illustrate this mechanism.

i. A Natural Aptamer for FMN

A highly conserved RNA domain, referred to as the RFN element, wasidentified in bacterial genes involved in the biosynthesis and transportof riboflavin and FMN (M. S. Gelfand, et al., Trends in Genetics 1999,15, 439; A. G. Vitreschak, et al., Nucleic Acids Research 2002, 30,3141). This element is required for genetic manipulation of the ribDEAHToperon (hereafter, ‘ribD’) of B. subtilis, as mutations resulted in aloss of FMN-mediated regulation (Y. V. Kil, et al., Molecular & GeneralGenetics 1992, 233, 483; V. N. Mironov, et al., Molecular & GeneralGenetics 1994, 242, 201). These data led to the proposal that either aprotein-based FMN sensor, or FMN itself (G. D. Stormo, Y. Ji,Proceedings of the National Academy of Sciences of the United States ofAmerica 2001, 98, 9465) interacts with the RFN element in order torepress ribD gene expression. However, there was no understanding of howsuch interactions would take place or the mechanism by which expressionwould be affected. Although RNA sequences that specifically bind FMN hadbeen identified through directed evolution experimentation (C. T.Lauhon, J. W. Szostak, Journal of the American Chemical Society 1995,117, 1246, M. Roychowdhury-Saha, et al., Biochemistry 2002, 41, 2492),they exhibit no obvious resemblances to the RFN element.

a. Structural Probing Reveals FMN-Mediated RNA Structure Modulation

Each internucleotide linkage in a RNA polymer is susceptible tospontaneous hydrolysis by an S_(N)2-like mechanism, wherein the 2′oxygen attacks the adjacent phosphorus center, leading to chaincleavage. This reaction requires a 180° orientation between theattacking nucleophile, the phosphorus center, and the 5′-oxygen leavinggroup (in-line conformation) (G. A. Soukup, R. R. Breaker, RNA 1999, 5,1308; V. Tereshko, et al., RNA 2001, 7, 405). Nucleotides that arebase-paired, or otherwise structurally constrained, are typicallyincapable of adopting this configuration and therefore display low ratesof spontaneous cleavage. In contrast, nucleotides that are structurallyunrestrained exhibit much higher rates of spontaneous cleavage. Theseobservations have been exploited in a structural probing method,referred to as “in-line probing”, which establishes the relative ratesof spontaneous cleavage for a given RNA polymer and correlates this withsecondary- and tertiary-structure models (V. Tereshko, et al., RNA 2001,7, 405).

To assess whether the RFN element of ribD was responsive to FMN, afragment of the corresponding 5′-UTR was 5′-³²P labeled and incubated inthe absence and presence of FMN, and the resulting fragments wereanalyzed by polyacrylamide gel electrophoresis (PAGE). Interestingly,patterns differ between reactions with and without FMN, signifying thatthere is a structural rearrangement of the RNA upon FMN binding to ribD.The spontaneous cleavages of certain nucleotide positions located withininter-helical regions of the RFN element become significantly reduced inthe presence of FMN, suggesting that these nucleotides are involved informing an FMN-RNA complex, which forces structural constraints upon theRNA (FIG. 12). It is this type of structural modulation that can beharnessed by the expression platform for allosteric modulation of geneexpression.

Additional evidence for direct binding of FMN by the ribD RFN elementwas generated by enzymatic probing. Oligonucleotides predicted to annealwith the RFN element were added to ribD transcripts in the presence andabsence of FMN, and the resulting mixtures was digested with RNase H(which specifically cleaves RNA:DNA heteroduplexes) and analyzed by PAGE(A. S. Mironov, et al., Cell 2002, 111, 747). A significant portion oftranscripts bind certain oligonucleotides in the absence of FMN, but notin the presence of FMN, indicating that FMN stabilizes a structuralrearrangement of ribD transcripts that in turn prevents annealing of theoligonucleotide.

b. Affinity and Specificity of the FMN-ribD Complex

If the RFN element serves as an aptamer for FMN, it should exhibitcharacteristics of a saturable receptor that has some ability todiscriminate against related ligands. To obtain values for apparentdissociation constant (apparent K_(D)) for FMN, in-line probing assayswere repeated with trace amounts of ribD RNA and increasingconcentrations of FMN; the ligand concentration that correlates withhalf-maximal modulation of RNA structure should reflect the apparentK_(D). These experiments indicate that the ribD RNA contains a saturableligand-binding site that exhibits an apparent K_(D) of ˜5 nM.Furthermore, the RNA discriminates against the dephosphorylated form ofFMN (riboflavin) by approximately three orders of magnitude. Thisexceptional ligand specificity of the ribD mRNA is surprising since theaptamer must generate a binding pocket for FMN that makes productiveinteractions with a phosphate group.

ii. FMN-Induced Transcription Termination

a. In Vitro Transcription Termination Mediated by an FMN Riboswitch

The relative amounts of the major transcription products for the ribDleader region were examined by in vitro transcription using T7 RNApolymerase or Bacillus subtilis RNA polymerase. The ribD leader regioncontains a classical intrinsic terminator just upstream of the ribDcoding region. Interestingly, transcripts that terminated at theintrinsic terminator are specifically induced by FMN, in the absence ofadditional protein factors. Furthermore, mutations in the RFN elementabrogate this phenomenon. The left-half of the terminator sequence formsalternative base-pairing interactions with a portion of the RFN element,thereby forming an antiterminator element. Sequence alterations of theintrinsic terminator eliminate FMN-induced termination while alterationsin the antiterminator result in constitutive termination. Takentogether, these observations are consistent with a mechanistic modelwherein FMN directly interacts with ribD transcripts during conditionsof excess FMN. Complex formation subsequently induces transcriptiontermination within the 5′-UTR (FIG. 12), which precludes gene expressionby preventing the ORF from being transcribed. During conditions oflimiting FMN, an antiterminator structure is formed within the ribDnascent transcript, which allows for synthesis of the downstream genes.

b. FMN-Mediated Control of Transcription Termination In Vivo

The molecular details of riboswitch-mediated transcription terminationare likely to be more complex than this rather simplistic model implies.For example, given that the ‘decision’ to form the terminator orantiterminator conformation occurs only once during transcription, theregulatory mechanism is likely to rely on precise transcriptionalkinetics as well as the appropriate RNA folding pathways. Moreover, thekinetics of FMN interacting with the RNA receptor is likely a criticalfactor. Although the affinity that the RNA has for FMN is exceptionallystrong compared to engineered aptamers, it is possible that the kineticsof ligand association might be the more important determinant of geneticregulation. Indeed, all of these parameters are likely to conspiretogether in order to exert appropriate control over the intrinsicterminator. In adapting and designing riboswitches for use as describedherein, the impact of transcription speed should be taken into account.

iii. Control of Transcription Termination by Other Riboswitches

Intrinsic terminators can be identified via computer-assisted searchalgorithms (F. Lillo, et al., 2002, 18, 971). Using such bioinformaticanalyses, a subset of riboswitch RNAs that are predicted to contain anintrinsic terminator and an alternate antiterminator structural elementcan be identified (M. Mandal, et al., Cell 2003, 113; A. G. Vitreschak,et al., Nucleic Acids Research 2002, 30, 3141; F. J. Grundy, T. M.Henkin, Molecular Microbiology 1998, 30, 737; S. Kochhar, H. Paulus,Microbiology 1996, 142, 1635; D. A. Rodionov, et al., Journal ofBiological Chemistry 2002, 277, 48949). Therefore, the results describedabove for the FMN riboswitch are indicative of the mechanisms used bymany other riboswitch RNAs. Indeed, SAM- and TPP-dependent riboswitcheshave been demonstrated to exert control over termination via formationof mutually exclusive intrinsic terminator and antiterminator structures(see, e.g., Example 7). Furthermore, mutations that disrupt andsubsequently restore helices within the SAM riboswitch aptamer result inloss and restoration, respectively, of SAM binding. Concurrently, thesemutations also result in disruption or restoration of SAM-inducedtranscription termination in accordance with ligand-binding function.Riboswitches can be adapted and designed to exert control overtranscription termination signals that differ appreciably from classicalintrinsic terminators according to principles described herein. Asdescribed elsewhere herein, expression platform domains havingexpression-controlling stem structures can be matched to aptamer domainsby designing the P1 stem of the aptamer domain such that the controlstrand (P1b) of the aptamer can form a stem structure with the regulatedstrand (P1c) of the expression platform.

4. Riboswitch Regulation of Translation Initiation in Bacteria

An alternative mechanism of genetic control by riboswitches is themodulation of translation initiation. Unlike transcription termination,the entire mRNA would be synthesized by RNA polymerase, but expressionwould be prevented by the riboswitch until the metabolite concentrationreached a certain level. In most instances, it was observed thatriboswitches prevent translation initiation in the presence of highconcentrations of target metabolite. However, riboswitches can bedesigned and adapted such thatallosteric modulation of riboswitchstructures could lead to translation activation. The regulatorymechanism of translation control is briefly described below for a TPPriboswitch from E. coli.

i. A Natural Aptamer for TPP

A conserved RNA element, referred to as the thi box, was identifiedwithin 5′-UTRs of mRNAs that are responsible for thiamine biosynthesisand transport (D. A. Rodionov, et al., Journal of Biological Chemistry2002, 277, 48949; J. Miranda-Rios, M. Navarro, M. Soberon, Proceedingsof the National Academy of Sciences of the United States of America2001, 98, 9736.). Genetic experiments confirmed that this structuralelement was required for thiamine-dependent regulation of Rhizobiummeliloti thiamine biosynthesis genes (J. Miranda-Rios, M. Navarro, M.Soberon, Proceedings of the National Academy of Sciences of the UnitedStates of America 2001, 98, 9736), yet no regulatory factor had beenidentified through classical genetic experimentation. Therefore, it waspossible that the thi box might serve as a portion of a riboswitch thatresponds to thiamine or its derivatives.

In E. coli, thiamine biosynthesis and transport genes are primarilylocated within three operons and four single genes (T. P. Begley, etal., Archives of Microbiology 1999, 171, 293), wherein each operon ispreceded by a thi element. To begin to assess the regulatory propertiesof these sequences, the leader regions for the thiMD and thiCEFSGHoperons were utilized to construct transcriptional and translationalfusions to a lacZ reporter gene (see Example 2). Addition of exogenousthiamine results in repression of the lacZ reporter gene in E. coli.Results from these data demonstrate that the thiM gene is regulatedprimarily at the level of translation while the thiC leader regionconfers both transcriptional and translational regulation to the lacZreporter.

a. Direct Binding of Thiamine Pyrophosphate by E. coli mRNAs

As described above for the FMN aptamer, direct binding of TPP to thethiM and thiC leaders was demonstrated by in-line probing assays (seeExample 2). The addition of thiamine, thiamine monophosphate (TP), orthe pyrophosphate derivative (TPP) leads to structural rearrangement ofthe thiM RNA, particularly in the region encompassing the thi element(FIG. 13). Significantly, TPP, which is the bioactive form of thiamine,exhibits the best affinity between the ligands, with an apparent K_(D)of 500 nM, while TP and thiamine associate to thiM with apparent K_(D)values of 3 μM and 40 μM, respectively. In-line probing assays of RNAsresembling the thiC leader region reveal even more dramaticdiscrimination between thiamine and its phosphorylated forms, exhibitinggreater than a 1,000-fold difference between binding of thiamine andTPP. These data are consistent with genetic experiments that suggestedthat TPP synthesis was required for regulation (E. Webb, et al., Journalof Bacteriology 1996, 178, 2533; E. Webb, D. Downs, Journal ofBiological Chemistry 1997, 272, 15702). Also, this system providesanother example of a natural RNA aptamer that makes productive contactsto phosphate groups.

b. Confirmation of TPP Binding by Equilibrium Dialysis

RNAs resembling the thiM leader region were synthesized and placed intoone side of a two-chamber equilibrium dialysis apparatus, in which thecompartments are separated by a 3000-dalton molecular-weight-cut-offdialysis membrane. ³H-thiamine was preferentially retained within thethiM-containing chamber when allowed to equilibrate between chambers(see Example 2). This effect could be eliminated by providing excessunlabeled thiamine, but could not be reversed when supplemented withoxythiamine, a close chemical analog of thiamine. Additionally, amutated version of thiM was unable to shift ³H-thiamine to theRNA-containing chamber. Together, these data are indicative of theformation of stable thiM:thiamine complexes, wherein the sequence of theRNA and the chemical form of the ligand are critical for maximal bindingaffinity.

ii. Binding of Thiamine Derivatives Correlates with StructuralModulation

Close inspection of in-line probing data for thiM reveal two surprisingpatterns of structural modulation. First, the relative rates ofspontaneous fragmentation between reactions containing either thiamineor TPP differ within an internal loop of the thi element (FIG. 13).Nucleotides in this region adopt an increase in structural order in thepresence of TPP but not with thiamine, implying this region is somehowinvolved in formation of a pyrophosphate-recognition pocket. Secondly,the region of the SD sequence is the only portion outside of the thielement that becomes structurally modulated in the presence of TPP.

Specifically, the SD sequence exhibits a significant decrease inspontaneous cleavage relative to reactions lacking TPP, suggesting thatthe SD is converted into a more structurally constrained form uponbinding of TPP. This idea is consistent with a mechanism (FIG. 13)whereby in the absence of TPP the SD has a significant degree ofsingle-stranded character and is accessible for translation initiation.An anti-SD sequence is proposed to interact with an anti-anti-SDsequence within the TPP aptamer under these conditions. In contrast,during conditions of excess TPP, a TPP-RNA complex is formed thatdisrupts the base pairing of the anti-SD sequence, which is then free tointeract directly with the SD and decrease the single-stranded characterof the region, hence decreasing efficiency of translation initiation.Preliminary site-directed mutagenesis of the thiM mRNA supports thisoverall model (see Example 2). Specifically, mutations that disrupt TPPbinding also disrupt regulation of translation for thiM-lacZ fusions,while mutations that alter the anti-SD sequence affect regulation but donot affect TPP binding. Thus, binding of thiamine correlates with boththe structural accessibility of the SD and the translation efficiency invivo.

iii. Control of Translation Initiation by Other Riboswitches

Bioinformatics analyses are consistent with molecular mechanisms similarto that of thiM also being recurrent amongst riboswitch RNAs.Specifically, anti-SD and anti-anti-SD structures have been proposed forseveral riboswitch classes, including FMN (A. G. Vitreschak, et al.,Nucleic Acids Research 2002, 30, 3141), lysine, TPP (D. A. Rodionov, etal., Journal of Biological Chemistry 2002, 277, 48949), coenzyme B₁₂(see Example 1) and SAM. In general, riboswitches from Gram-negativeorganisms seem to favor expression platforms that exert control overtranslation, while riboswitches from Gram-positive bacteria appear topredominately use expression platforms that control transcriptiontermination. The latter can reflect a greater reliance upon multigenetranscriptional units in Gram-positive organisms, which might be moreefficient to preclude transcription of long operons when the geneproducts are unnecessary.

Biochemical evidence for riboswitch-mediated control over translationinitiation has also been obtained for FMN and AdoCbl riboswitches (seeExample 1). FMN binding to a riboswitch that regulates the B. subtilisypaA gene results in alteration of the SD structural context, similar towhat was observed for thiM. Interestingly, this genetic control elementhas also been proposed to regulate ypaA transcription (J. M. Lee, etal., Journal of Bacteriology 2001, 183, 7371), although the leaderregion does not contain an obvious intrinsic terminator structure.Binding of AdoCbl to the E. coli btuB riboswitch has also beendemonstrated to correlate with regulation of translation in vivo.

Certain riboswitch RNAs exert control over transcription and translationusing the same RNA sequence. For this class of riboswitches, the SDsequence is contained within an intrinsic terminator. Therefore, theformation of the terminator structure also enacts formation of aSD-sequestering structure. In total, all of these observations suggestthat although the thiM and ribD riboswitches represent useful paradigmsfor riboswitch-mediated control of translation and transcription,respectively, there are likely to be a wide variety of molecularmechanisms utilized by riboswitch RNAs for control of gene expression.Indeed, TPP riboswitches that must be employing different mechanisms ofcontrol have been identified in several plant and fungal species (seeExample 4). The placement of these RNAs near splice sites in someinstances and in the 3′-UTR in others indicate TPP-responsive controlover splicing and mRNA stability or expression, respectively.

5. Early Origins?

The FMN, TPP, lysine and AdoCbl riboswitch RNAs are widespread amongevolutionarily distant microorganisms, implying an ancient origin forthese RNA genetic elements (A. G. Vitreschak, et al., Nucleic AcidsResearch 2002, 30, 3141; D. A. Rodionov, et al., Journal of BiologicalChemistry 2002, 277, 48949; D. A. Rodionov, et al., Journal ofBiological Chemistry 2002, 277, 48949). SAM, guanine, and adenineriboswitches are also represented in numerous different genera, althoughthey appear to be primarily limited to Gram-positive bacteria, with afew Gram-negative bacteria as exceptions (see Example 6). In allinstances, the structural and sequence conservation of riboswitchclasses is limited to the aptamer domain (FIG. 11). This is notunexpected given that the aptamer RNA must preserve its capability tobind the target chemical, which has not been significantly modifiedthrough evolution. In contrast, there is considerable sequence andstructural diversity between expression platforms, even betweenriboswitches of the same class and within the same organism. Together,these data hint that the ligand-binding properties of riboswitch aptamerdomains have been maintained throughout expansive evolutionarytimescales.

Furthermore, the ligands for riboswitch RNAs have been proposed to befunctional relics from a hypothetical RNA-based world, in which RNApolymers provided all the necessary catalytic and genomic content forsome of the earliest self-replicating organisms (H. B. White, 3rd,Journal of Molecular Evolution 1976, 7, 101; G. F. Joyce, Nature 2002,418, 214). Therefore it is tempting to speculate that ascofactor-binding RNAs the aptamer domains from riboswitches may havebeen useful in the context of an RNA-based world for some of theearliest forms of genetic control, for allosteric modulation ofribozymes, or as part of ribozymes that utilized the ligands ascatalytic cofactors.

6. Riboswitches as Drug Targets and Genetic Tools

Riboswitches are utilized for control of numerous genes involved in thebiosynthesis and transport of prokaryotic enzymatic cofactors. At least69 genes, which represents nearly 2% of Bacillus subtilis total genomiccontent, is under control of riboswitch RNAs (Table 1), exemplifying theextensive use of riboswitch RNAs for genetic control in prokaryotes. (M.Mandal, et al., Cell 2003, 113). Many riboswitch-mediated genes areexpected to be essential under most growth conditions. Interference withriboswitch function is then predicted to result in dramaticdestabilization of vital metabolic pathways and perhaps, cessation ofgrowth. Therefore, it seems likely that compounds that closely resemblethe target metabolites will bind to riboswitch RNAs and cause a decreasein gene expression. If this analog-induced disruption of gene expressionis sufficient, then such compounds might be candidates for antimicrobialapplications.

TABLE 1 Distribution of known riboswitch classes in Bacillus subtilis.Predicted Gene Ligand Transcriptional Unit Function(s) Lysine lysCAspartokinase II Flavin ypaA Putative flavin transporter mononucleotideribD-ribE-ribBA-ribH Riboflavin biosynthesis AdenosylcobalaminyvrC-yvrB-yvrA-yvqK Unknown; similar to iron transport proteins ThiaminethiC Biosynthesis of thiamine pyrophosphate pyrimidine moietytenA1-thiX1-thiY1-thiz1-thiE2-thiO-thiS- Thiamine biosynthesisthiG-thiF-thiD ykoF-ykoE-ykoD-ykoC Unknown yuaJ Unknown; putativethiamine transporter ylmB Similar to acetylornithine deacetylase GuanineyxjA Similar to pyrimidine nucleoside transport xpt-pbuX Xanthinepermease pbuG Hypoxanthine/Guanine permeasepurE-purK-purB-purC-purS-purQ-purL- Purine biosynthesispurF-purM-purN-purH-purD Adenine ydhL Unknown S-adenosylmethionine yitJPutative methylene tetrahydrafolate reductase metI-metC Methioninebiosynthesis ykrT-ykrS 5′ methylthioadenosine recycling pathwayykrW-ykrX-ykrY-ykrZ 5′ methylthioadenosine recycling pathwaycysH-cysP-sat-cysC-ylnD-ylnE-ylnF Cysteine biosynthesis yoaD-yoaC-yoaBUnkown metE Methionine synthase, B₁₂- independent metKS-adenosylmethionine synthetase yusC-yusB-yusA Unknown ABC transporteryxjG Unknown yxjH Unknown Table 1. Gene nomenclature is derived from theSubtiList database except for metI and metC, which are recentdesignations (S. Auger, et al., Microbiology 2002, 148, 507). Functionalroles for ypaA (R. A. Kreneva, et al., Genetika 2000, 36, 1166), yuaJ(D. A. Rodionov, et al., Journal of Biological Chemistry 2002, 277,48949), ykrTS (B. A. Murphy, et al., Journal of Bacteriology 2002, 184,2314), and ykrWXYZ (B. A. Murphy, et al., Journal of Bacteriology 2002,184, 2314.), have recently been proposed.

There is clear precedence for the targeting of RNAs with small moleculedrugs (G. J. Zaman, et al., Nucleic Acids Research 2002, 30, 62), themost obvious example being that of ribosomal RNA. Several otherbacterial-specific RNAs have been explored as candidates for smallmolecule drug interaction; however, the approach relies upon screeninglarge chemical libraries for those chemicals that fortuitously interactwith the RNA of interest, even though the RNA itself does not naturallyform a binding pocket for small organic molecules. Riboswitch RNAstherefore exhibit an advantage in antimicrobial development given thatthey serve as a receptor for small molecule ligands, much like theirprotein receptor counterparts.

In addition to their use as targets for chemical inhibition,understanding of the mechanisms utilized by natural riboswitch RNAsallows adaptation of riboswitches and development of new riboswitches asnovel genetic control elements. Numerous aptamer RNA sequences have beenidentified that interact with a wide variety of small organic molecules(M. Famulok, Current Opinion in Structural Biology 1999, 9, 324).Engineered riboswitches can be generated that respond to non-biological,or otherwise metabolically inert, compounds. Such genetic controlelements can be used for a variety of expression control and moleculardetection applications.

D. Example 4 Eukaryotic Riboswitches

1. Abstract

Genetic control by metabolite-binding mRNAs is wide spread inprokaryotes. These “riboswitches” are typically located in non-codingregions of mRNA, where they selectively bind their target compound andsubsequently modulate gene expression. Disclosed are mRNA elements thathave been identified in fungi and in plants that match the consensussequence and structure of thiamine pyrophosphate-binding domains ofprokaryotes. In Arabidopsis, the consensus motif resides in the 3′-UTRof a thiamine biosynthetic gene, and the isolated RNA domain binds thecorresponding coenzyme in vitro. These results suggest thatmetabolite-binding mRNAs possibly are involved in eukaryotic generegulation and that some riboswitches might be representatives of anancient form of genetic control.

2. Introduction

Riboswitches are genetic control elements that can be found in the5″-untranslated region of certain messenger RNAs of prokaryotes (seeExamples 1-3). These genetic switches exhibit two surprising properties.First, the mRNA is able to form a highly selective binding site for thetarget metabolite without the aid of proteins. Second, metabolitebinding brings about an allosteric reorganization of RNA structure thatleads to alterations in genetic expression. Unlike many other geneticcontrol systems, riboswitches do not require metabolite-binding proteinsto serve as sensors, and thus offer a direct link between the geneticinformation that is encoded by an mRNA and its chemical surroundings.

A number of distinct types of riboswitches have been confirmed bybiochemical and genetic analyses. For example, a coenzyme B₁₂-bindingRNA has been shown (Example 1) to control expression of the Escherichiacoli btuB gene, which encodes a cobalamin transport protein.Riboswitches triggered by thiamine pyrophosphate (TPP) have been shownto control operons in E. coli (Example 3) and Bacillus subtilis (Example6) that are responsible for biosynthesis of this coenzyme. In addition,the RFN element, which frequently is found in the 5′-untranslated regionof genes responsible for the biosynthesis or import of riboflavin andFMN, serves as the receptor portion of FMN-dependent riboswitches inBacillus subtilis (see Examples 3 and 6). Recently, it has beendetermined that certain S-box motifs that are located in the 5′-UTRs ofnumerous genes in B. subtilis bind the coenzyme S-adenosylmethionine(SAM) with high affinity and precision. These findings indicate thatriboswitches are used to recognize a diverse collection of metabolitesand that direct sensing of small molecules by mRNAs is an important formof genetic control for certain organisms. Disclosed herein, is evidencethat metabolite-binding domains are embedded in certain mRNAs ofeukaryotes, indicating that higher organisms might also exploitriboswitches for genetic control.

3. Results

Disclosed are many RNA elements that have been identified in prokaryotesthat exhibit sequence similarity to the B₁₂- and SAM-dependentriboswitches. Given the relatively large size and sequence complexity ofthese RNA motifs, it is unlikely that numerous evolutionary reinventionsof the same elements would have occurred. Furthermore, the metabolitetriggers of these genetic switches are predicted to have been present ina time before the emergence of proteins (White, 1976; Benner et al.,1989; Jeffares et al., 1998). This is consistent with the known classesof metabolite-sensing RNAs having originated in the ancient RNA world,which is believed to be a time before the emergence of proteins and whenmetabolism was guided entirely by RNA (Joyce, 2002).

If the present-day riboswitches are of ancient origin, then eukaryotesmight possess RNA genetic switches that are descendent from the lastcommon ancestor of modern cells. Disclosed herein several eukaryotescarry RNA domains that conform to the consensus sequence and structureof the metabolite-binding domain of the TPP riboswitch class (FIG. 14A)(The mRNAs that carry the TPP-binding domains encode for a protein thatis homologous to the thiC protein of E. coli. This protein enzymecatalyzes the conversion of 5-aminoimidazole ribotide (AIR) tohydroxymethylpyrimidine phosphate (HMP-P), which is a key biosyntheticstep in the synthesis of thiamine and ultimately TPP (Vander Horn etal., 1993; Begley et al., 1999)). For example, a putative thiaminebiosynthesis gene of Arabidopsis thaliana carries an RNA element (FIG.14B) in its 3′-UTR that conforms to the consensus TPP-binding domain.Similar RNA elements are found in rice (Oriza sativa) and bluegrass (Poasecunda). RNA elements that conform to the TPP-binding sequence andstructure are also present in fungi such as Neurospora crassa (FIG. 14C)and Fusarium oxysporum. As with plants, the riboswitch homologs in fungiare located in genes that have been implicated in the biosynthesis ofthiamine, suggesting that in each case their role is to maintainrequired coenzyme levels by modulating expression of the appropriatebiosynthetic genes. A sequence alignment of the homologous domains foundin eukaryotes compared to that of the gram negative bacterium E. coli(thiC and thiM) and the gram positive bacterium Chlostridiumacetobutylicum (thiC) is depicted in FIG. 15.

The RNA element corresponding to the consensus TPP-binding domain of A.thaliana (FIG. 14A) was generated by in vitro transcription of asynthetic DNA template and the RNA was subjected to “in-line probing”(FIG. 16A). This method relies on the spontaneous breakdown of RNAphosphodiester linkages, whose pattern of cleavage can be used to revealthe structural and functional features of ligand-binding RNAs (seeExamples 1-3). Indeed, the riboswitch-like element exhibitsTPP-dependent structural modulation and has a fragmentation pattern thatis consistent with the predicted secondary structure of TPP riboswitchesfrom bacteria (see Examples 2 and 3). In addition, thisstructure-probing method has been used herein to establish that the RNAbinds TPP with an apparent dissociation constant (K_(D)) of ˜50 nM (FIG.16B), which is similar to that determined previously for an E. coliriboswitch variant. Similarly, it has been demonstrated that thesequence elements of fungi that correspond to the TPP riboswitchconsensus also bind TPP with high affinity.

Sequestering of the ribosome binding site and transcription terminationare demonstrated mechanisms for TPP riboswitches in E. coli (FIG. 17).Since the TPP-binding element in plants is located immediately upstreamfrom the polyA tail, it is possible that metabolite binding mightregulate mRNA processing and stability. Alternatively, a consensusTPP-binding sequence (FIG. 14C) identified in the fungal genome of N.crassa resides in an intron, suggesting that RNA splicing might also beguided by metabolite-binding pre-mRNAs. In prokaryotes, ligand bindingtypically brings about allosteric changes in the Watson-Crick basepairing arrangements near gene control elements such as transcriptionterminators and ribosome binding sites. Likewise, secondary structurerearrangements by metabolite-binding riboswitches can be used tomodulate a greater variety of RNA processing, transport and expressionpathways in eukaryotes.

Although it is likely that TPP-binding domains and those for coenzymeB₁₂, FMN, and SAM are of ancient origin, it is possible that otherexamples of metabolite-binding mRNAs have emerged more recently inevolution. These newer riboswitches would be more narrowly distributedacross the phylogenetic landscape, so efforts to search for newriboswitches that are triggered by compounds that are not ancient anduniversally distributed will be difficult. Regardless of the scope ofriboswitch use in modern organisms, both natural and engineeredriboswitches could have significant utility. Given the central role thatknown riboswitches serve in modulating the concentration of keycoenzymes, these RNAs can serve as new targets for drug discoveryefforts. Therefore, reverse engineering of natural riboswitches can beused to establish a conceptual basis for creating designer riboswitchesfor the purposeful control of eukaryotic genes.

E. Example 5 Lysine Riboswitches

The precise control of gene expression in response to changes in thechemical and physical environment of cells requires selectiveinteractions between biochemical sensor elements and the molecules thatcarry or interpret genetic information. Most known genetic factors thatrespond to such environmental changes are proteins (Ptashne and Gann2002). However, a number of studies (e.g. see Examples 1-3 and 6-8) havedemonstrated that natural RNA molecules can also recognize small organiccompounds and harness allosteric changes to control the expression ofadjacent genes. These metabolite-binding RNA domains, termedriboswitches, typically are embedded within the 5′-UTRs of mRNAs andcontrol the expression of proteins involved in the biosynthesis orimport of the target compound. Riboswitches also play an important rolein controlling fundamental metabolic pathways in bacteria involved insulfur metabolism, and in the biosynthesis of various coenzymes andpurines (see Example 6). Furthermore, riboswitches are phylogeneticallywidespread amongst eubacterial organisms, and both sequence andbiochemical data suggest that riboswitches are also present in the genesof eukaryotes (see Example 4).

These observations indicate that riboswitches likely comprise a widelyused mechanism of genetic control in living systems. Transcription ofthe lysC gene of B. subtilis is repressed by high concentrations oflysine (Kochhar, S., and Paulus, H. 1996, Microbiol. 142:1635-1639;Mader, U., et al., 2002, J. Bacteriol. 184:4288-4295; Patte, J. C. 1996.Biosynthesis of lysine and threonine. In: Escherichia coli andSalmonella: Cellular and Molecular Biology, F. C. Neidhardt, et al.,eds., Vol. 1, pp. 528-541. ASM Press, Washington, D.C.; Patte, J.-C., etal., 1998, FEMS Microbiol. Lett. 169:165-170), but that no proteinfactor had been identified that served as the genetic regulator (Liao,H.-H., and Hseu, T.-H. 1998, FEMS Microbiol. Lett. 168:31-36). The lysCgene encodes aspartokinase II, which catalyzes the first step in themetabolic pathway that converts L-aspartic acid into L-lysine (Belitsky,B. R. 2002. Biosynthesis of amino acids of the glutamate and aspartatefamilies, alanine, and polyamines. In: Bacillus subtilis and its ClosestRelatives: from Genes to Cells. A. L. Sonenshein, J. A. Hoch, and R.Losick, eds., ASM Press, Washington, D.C.). Interestingly, severalefforts have been successful in generating mutants that exhibitconstitutive expression of the aspartokinase II enzyme, and allmutations map to the 5′-UTR of the lysC mRNA (Boy, E., et al., 1979.Biochimie 61:1151-1160; Lu, Y., et al., 1991, J. Gen. Microbiol.137:1135-1141; Lu, Y., et al., 1992, FEMS Microbiol. Lett. 92:23-27).Furthermore, a significant level of sequence similarity was identifiedbetween the B. subtilis and E. coli lysC 5′-UTRs (Patte, J.-C., et al.,1998, FEMS Microbiol. Lett. 169:165-170.). These characteristics areconsistent with a lysine-responsive riboswitch serving as the geneticcontrol element for this gene.

1. Materials and Methods

i. Chemicals and Oligonucleotides

L-lysine, all analogs with the exception of L-α-homolysine (compound 6,FIG. 20A), tritiated lysine (L-Lysine-[4,5³H(N)]), and the fourdipeptides were purchased from Sigma. A protocol adapted from thatreported previously (Dong, Z. 1992, Tetrahedron Lett. 33:7725-7726) wasused to synthesize L-α-homolysine. Purity and integrity of syntheticL-α-homolysine was confirmed by TLC and NMR.

DNA oligonucleotides were synthesized by the HHMI Keck FoundationBiotechnology Resource Center at Yale University, purified by denaturingPAGE and eluted from the gel by crush-soaking in 10 mM Tris-HCl (pH 7.5at 23° C.), 200 mM NaCl, and 1 mM EDTA. Oligonucleotides were recoveredfrom solution by precipitation with ethanol.

ii. Phylogenetic Analyses

L box domains were identified by sequence similarity to the B. subtilislysC 5′-UTR. Ultimately, the program was used to search for degeneratematches to the pattern (WAGAGGNGC [10] A [3] RKTA [50] RRGR [10] CCGARR[40] GG [13] VAA [13] YTGTCA [36] TGRWG [2] CTWY) (SEQ ID NO:376),however, less complete versions of this pattern were used with iterativerefinements to identify the consensus sequence and structure of the Lbox motif. Bracketed numbers are variable gaps with constrained maximumlengths denoted. Nucleotide notations are as follows: Y=pyrimidine;R=purine; W=A or T; K=G or T; V=A, G or C. Up to six violations of thispattern were permitted when forming the phylogeny depicted in FIG. 18.

iii. In-Line Probing of RNA Constructs

The B. subtilis 315 lysC, 237 lysC and 179 lysC RNAs were prepared by invitro transcription using T7 RNA polymerase and the appropriate PCR DNAtemplates. RNA transcripts were dephosphorylated and subsequently 5′³²P-labeled using a protocol similar to that described previously(Seetharaman, S. et al., 2001, Nature Biotechnol. 19, 336-341). Labeledprecursor RNAs (˜2 nM) were subjected to in-line probing usingconditions similar to those described in Examples 1 and 2. Reactions (10μL) were incubated for 40 hr at 25° C. in a buffer containing 50 mM Tris(pH 8.5 at 25° C.), 20 mM MgCl₂ and 100 mM KCl in the presence orabsence of L-lysine or various analogs as indicated for each experiment.Denaturing 10% PAGE was used to separate spontaneous cleavage products,which were detected and quantitated by using a Molecular DynamicsPhosphorImager and ImageQuaNT software.

iv. Equilibrium Dialysis and Scatchard Analyses

Equilibrium dialysis assays were conducted using a DispoEquilibriumDialyzer (ED-1, Harvard Bioscience), wherein two chambers a and b wereseparated by a 5,000 MWCO membrane. The final composition of bufferincluded 50 mM Tris-HCl (pH 8.5 at 25° C.), 20 mM MgCl₂ and 100 mM KCl(30 μL delivered to each chamber). Assays were initiated by the additionof ³H-lysine (50 nM initial concentration prior to equilibration; 40 Cimmol⁻¹; 15,000 cpm) to chamber a. When present, RNA (179 lysC) wasintroduced into chamber b to yield a concentration of 10 μM. After 10 hrof equilibration at 25° C., a 3-μl aliquot from each chamber was removedfor quantitation by liquid scintillation counter. Competition assayswere established by delivering an additional 3 μL of buffer to a and anequivalent volume of buffer containing 50 μM unlabeled L-lysine,D-lysine, L-ornitihine, or L-lysine hydroxamate as indicated to b. After10 hr of additional incubation at 25° C., 3-μl aliquots were again drawnfor quantitation of tritium distribution.

Scatchard data points were generated as described above with thefollowing exceptions. RNA was added to chamber b to yield aconcentration of 1 μM RNA and equilibration of the dialysis mixturesproceeded for 20 hr. In addition, ³H-lysine concentrations were variedfrom 50 nM to 2.5 μM. Calculation of points on the Scatchard plot fromthe equilibrium dialysis data was carried out as described elsewhereherein.

v. In Vitro Transcription Termination Assays

Transcription termination assays were conducted using a method ofsingle-round transcription adapted from that described previously(Landick, R., et al., 1996, Methods Enzymol. 274:334-353). The templatefor lysC 5′-UTR transcription was altered (C6G of the RNA) such that thefirst C residue of the nascent RNA is not encountered until position 17.Polymerization was initiated by the addition of a mixture of ApAdinucleotide (1.35 μM), GTP and UTP (2.5 μM each) plus unlabeled ATP (1μM) and [α-³²P]-ATP (4 μCi), which was incubated for 10 min. Haltedcomplexes are restarted by the addition of 150 μM each of the four NTPs,and heparin (0.1 mg mL⁻¹) is simultaneously added to prevent polymerasesfrom initiating transcription on new templates. Transcription mixturesalso contained 20 mM Tris-HCl (pH 8.0 at 23° C.), 20 mM NaCl, 14 mMMgCl₂, 0.1 mM EDTA, 0.01 mg/mL BSA, 1% v/v glycerol, 4 pmoles DNAtemplate, 0.045 U μL⁻¹ E. coli RNA polymerase (Epicenter, Madison,Wis.), and 10 mM of L-lysine or the lysine analog as indicated for eachexperiment. Reactions were incubated for an additional 20 min at 37° C.and the products were examined by denaturing 6% PAGE followed byanalysis using a PhosphorImager.

vi. In Vivo Analysis of lysC Genetic Variants

Fusions of the lysC 5′-UTR with a lacZ reporter gene were used to assessthe function of the lysine riboswitch in vivo using methods similar tothose described elsewhere herein. Briefly, the lysC 5′-UTR, comprisingthe promoter and the first 315 nucleotides of the transcriptiontemplate, was prepared as an EcoRI-BamHI fragment by PCR. Sequencevariants M1 through M3, G39A, and G40A were generated by PCRamplification of the wild-type construct using primers that carried thedesired mutations. The PCR products were cloned into pDG1661 immediatelyupstream of the lacZ reporter gene and the integrity of the resultingclones were confirmed by sequencing. Transformations of pDG1661 variantsinto B. subtilis strain 1A40 (obtained from the Bacillus Genetic StockCenter, Columbus, Ohio) were performed and the correct transformantswere identified by selecting for chloramphenicol resistance andscreening for spectinomycin sensitivity.

Cells were grown with shaking at 37° C. either in rich medium (2XYTbroth or tryptose blood agar base) or defined medium (0.5% w/v glucose,2 g L⁻¹ (NH₄)₂SO₄, 18.3 g L⁻¹ K₂HPO₄.3H₂O, 6 g L⁻¹ KH₂PO₄, 1 g L⁻¹sodium citrate, 0.2 g L⁻¹ MgSO₄.7H₂O, 5 μM MnCl₂, and 5 μM CaCl₂.Methionine, lysine, and tryptophan were added to 50 μg mL⁻¹ for routinegrowth. Growth under lysine-limiting conditions was established byincubation under routine growth conditions in defined medium to an A₅₉₅of 0.1, at which time the cells were pelleted by centrifugation,resuspended in minimal medium, split into five aliquots, andsupplemented with five different media types as defined in the legend toFIG. 22C. Cultures were incubated for an additional 3 hr beforeperforming β-galactosidase assays.

2. Results

i. The L Box: a Conserved mRNA Element that is Important for GeneticControl

Riboswitches are typically formed by close juxtaposition of ametabolite-binding ‘aptamer’ domain and an ‘expression platform’ thatinterfaces with mRNA elements necessary for gene expression. Althoughthe RNA sequences and structural components that serve as the expressionplatform change significantly throughout evolution, the aptamer domainlargely retains the sequence composition of its ligand-binding corealong with the major secondary-structure features. This permits the useof phylogenetic analyses to identify related RNA domains and toestablish a consensus sequence and structure for a given class ofriboswitches.

Beginning with the sequence homology reported to exist between the lysC5′-UTRs of three bacterial species (Patte, J.-C., et al., 1998, FEMSMicrobiol. Lett. 169:165-170), the number of representatives wasexpanded using an algorithm that searches for related sequences andsecondary structures (e.g. see Examples 4 and 6). 31 representatives ofthis RNA domain, termed the “L box”, in the 5′-UTRs of lysC homologs andother genes related to lysine biosynthesis from a number ofGram-positive and Gram-negative organisms were identified (FIG. 18). Thesequence alignment reveals that the RNA forms a five-stem junctionwherein major base-paired domains are interspersed with 56 highlyconserved nucleotides (FIG. 19A). Furthermore, the base-paired elementsP2, P2a, P2b, P3 and P4 each appear to conform to specific lengthrestrictions, suggesting that they are integral participants in theformation of a highly structured RNA. It was also noticed that conservedsequences in the junction between stems P2 and P2a conform to a “loop E”motif, which is an RNA element that occurs frequently in otherhighly-structured RNAs (e.g. see Leonitis, N. B., and Westhof, E. 1998,J. Mol. Biol. 283:571-583).

The L box domain of the B. subtilis lysC mRNA resides immediatelyupstream from a putative transcription terminator stem (Kochhar, S., andPaulus, H. 1996, Microbiol. 142:1635-1639; Patte, J.-C., et al., 1998,FEMS Microbiol. Lett. 169:165-170). In several other riboswitches withsimilar arrangements (e.g. Examples 3 and 6), the 5′-UTR can be trimmedto separate the minimal aptamer domain from the adjacent expressionplatform. An RNA fragment (237 lysC, FIG. 19B), encompassing nucleotides1 through 237 of the lysC 5′-UTR, was generated and examined forallosteric function. This construct, which excludes the putativetranscription terminator stem, was subjected to structural analysis byin-line probing (Soukup, G. A. and Breaker, R. R. 1999, RNA 5:1308-1325)to determine whether the presence of lysine alters RNA structure. It wasobserved that 237 lysC exhibits a pattern of spontaneous RNA cleavage(FIG. 19C) that is consistent with the secondary structure model of theL box motif constructed from phylogenetic sequence data. Furthermore, itwas found that the addition of 10 μM L-lysine causes significant changesin the cleavage pattern at four locations along the RNA chain,indicating that allosteric modulation of the 5′-UTR fragment isoccurring. In addition, the same pattern of spontaneous cleavage andamino acid-dependent structural modulation was observed when using the179 lysC RNA construct, which encompasses only the most highly-conservedportion of the L-box motif (nucleotides 27 through 205 of the lysC5′-UTR).

A reduction of spontaneous cleavage is observed in each of the foursites of metabolite-induced structural modulation. In most instances, areduction in spontaneous cleavage is due to the nucleotides becomingmore ordered in the complex formed between RNA and its ligand (Soukup,G. A. and Breaker, R. R. 1999, RNA 5:1308-1325). Interestingly, thesefour groups of nucleotides are located at the center of the 5-stemjunction of the L box secondary structure model (FIG. 19B), implyingthat these nucleotides are directly involved in recognizing the aminoacid target. Similar patterns of ligand-induced structural modulationhave been observed with the aptamer domains of other riboswitches (seeExamples 2, 3 and 6).

ii. The Lysine Aptamer Exhibits High Specificity for L-Lysine andDiscriminates Against Closely-Related Analogs

Riboswitches, like their counterpart genetic factors made of protein,must exhibit sufficient specificity and affinity for their targetmetabolite in order to achieve precision genetic control. To examine themolecular recognition characteristics of the lysC L box domain, a seriesof in-line probing assays were performed using various analogs of lysineat 100 μM. The properties of a lysine analog collection were examined,wherein each compound carries minimal chemical changes relative toL-lysine (FIG. 20A). Nearly every chemical alteration to the amino acidrenders the compound incapable of causing a structural modulation of the179 lysC RNA (FIG. 20B). Perhaps most striking is that the RNA does notundergo structural modulation in the presence of D-lysine, which differsfrom L-lysine by the stereochemical configuration at a single carboncenter.

The absence of significant structural modulation in the presence ofD-lysine and of other analogs indicates that at least three points ofcontact are being made between the RNA and its amino acid target.Specifically, the observation that analogs 1, 3, and 4 fail to inducestructural modulation is consistent with contacts being made to theamino and carboxy groups of the chain atoms, and to the amino group ofthe side chain, respectively. Moreover, the failures of compounds 2, 5,6, 7 and 8 to induce conformational change in the RNA indicate that theaptamer forms a highly discriminating binding pocket that can measurethe length and the integrity of the alkyl side chain. This high level ofmolecular discrimination is of particular biological significance, as agenetic switch for lysine most likely must respond exclusively toL-lysine and not closely related natural compounds.

Similarly, the allosteric response of the 179 lysC RNA to variousdipeptides and acid-hydrolyzed dipeptides was examined. It washypothesized that dipeptides should not trigger allosteric modulation ofRNA structure, but that acid-mediated hydrolysis of dipeptides (FIG.20C) carrying at least 1 lysyl residue should become active. Aspredicted, 179 lysC does not undergo allosteric modulation upon theaddition of the dipeptides lys-lys, lys-ala, ala-lys, or ala-ala (FIG.20D). However, the three dipeptides that carry at least one lysylresidue induce structural modulation of RNA upon pretreatment of thedipeptides with 6 N HCl at 115° C. for 23 hr, followed by evaporationand neutralization. The extent of structural modulation (FIG. 20E)indicates that the samples containing the hydrolyzed lysine-containingdipeptides fully saturate the lysC aptamer, which is in accordance withthe acid-mediated release of saturating amounts (greater than 1 μM; seebelow) of L-lysine.

It was also observed that an intermediate level of structural modulationoccurs when D-lysine is pre-treated with HCl. Interestingly, thepublished rate of epimerization between D- and L-lysine (Engel, M. H.,and Hare, P. E. 1982. Racemization rates of the basic amino acids. YearBook Carnegie Inst. Washington 81:422-425) is sufficient to account forthe approximately 1 μM of L-lysine that is needed to producehalf-maximal structural modulation (FIG. 20E). These results areconsistent with lysine acting as the molecular ligand for the lysCaptamer, and that RNA conformational changes are not due to unknowncontaminants of the commercial L-lysine preparation.

iii. Binding Affinity and Stoichiometry of the B. subtilis L-LysineAptamer

An approximation of the dissociation constant (K_(D)) was made byconducting in-line probing assays with 179 lysC using variousconcentrations of L-lysine (FIG. 21A). The sites of structuralmodulation exhibit progressively lower levels of spontaneous cleavage inresponse to increasing concentrations of ligand. A plot of the extent ofRNA cleavage versus concentration of L-lysine (FIG. 21B) indicates thathalf-maximal structural modulation occurs when approximately 1 μM aminoacid is present in the mixture, thus reflecting the apparent K_(D) ofthe 179 lysC for its target ligand.

The apparent K_(D) value for a longer construct that encompassesstructural elements predicted to be involved in transcriptiontermination exhibits a significantly poorer affinity for L-lysine.Specifically, an RNA construct encompassing nucleotides 1 through 315 ofthe lysC 5′-UTR was found by in-line probing to exhibit an apparentK_(D) of ˜500 μM. Similar differences in ligand affinities for otherriboswitches have been observed, wherein the minimized aptamer bindsmore tightly its cognate ligand compared to the same aptamer in thecontext of the complete riboswitch (aptamer plus the adjoiningexpression platform). This is most likely due to the presence ofcompeting secondary or tertiary structures that might be important forthe function of the riboswitch as a genetic control element, but thatreduce ligand binding affinity by reducing pre-organization of theaptamer domain.

Equilibrium dialysis also was used to examine the affinity andspecificity of the 179 lysC aptamer for its target (FIG. 21C). In theabsence of RNA, tritiated L-lysine is expected to distribute equallybetween the two chambers (a and b) of an equilibrium dialysis apparatus.However, the addition of excess aptamer to one chamber of the systemshould shift the distribution of tritium towards this chamber as aresult of complex formation. This asymmetric distribution of tritium isexpected to be restored to unity by the addition of a large excess ofunlabeled competitor ligand, which displaces the bulk of the tritiatedlysine from the RNA. As expected, the fraction of tritiated L-lysine inchamber b of the equilibrium dialysis apparatus is ˜0.5 in the absenceof RNA (FIG. 21C) after a 10 hr incubation. This fraction is altered to˜0.8 after incubation when a 200-fold excess of 179 lysC (10 μM) isadded to chamber b, while this symmetric distribution of tritium isrestored upon incubation for an additional 10 hours after theintroduction of excess (50 μM) unlabeled L-lysine. Furthermore, D-lysineand L-ornitihine do not restore equal distribution of tritium, which isconsistent with their failure to modulate RNA structure as determined byin-line probing.

A Scatchard plot also was created by using data from a series ofequilibrium dialysis experiments conducted under various concentrationsof tritiated L-lysine (FIG. 21D). The slope of the resulting lineindicates that the 179 lysC RNA binds to L-lysine with an apparent K_(D)of ˜1 μM, which is consistent with that observed by using in-lineprobing. Furthermore, the x intercept of the line occurs near an r valueof 1, which demonstrates that the RNA forms a 1:1 complex with itsligand.

iv. The Lysine Aptamer and Adjacent Sequences Function as an AminoAcid-Dependent Riboswitch

With a number of riboswitches examined to date, there is a discernableset of structures residing immediately downstream of the aptamer domainthat serve to control gene expression in response to ligand binding.Typically, the structure of this “expression platform” is modulated bymetabolite binding to the aptamer domain. The alternative structuresubsequently leads to modulation of transcription or translationprocesses. For example, the TPP riboswitch on the thiM mRNA of E. colicarries an expression platform that appears to preclude ribosome bindingto the Shine-Dalgarno sequence of the adjacent coding region (seeExample 2). Similarly, the expression platforms of various riboswitchesfrom B. subtilis undergo ligand-induced formation of a stem-loopstructure that induces transcription termination (e.g. Examples 3, 6 and7).

It has been reported that the lysC mRNA undergoes transcriptiontermination in cultured B. subtilis cells grown in the presence ofexcess L-lysine (Kochhar, S., and Paulus, H. 1996, Microbiol.142:1635-1639.). It was observed herein that a sequence domain thatparticipates in forming the P1 stem of the lysC aptamer is complementaryto a portion of the putative terminator hairpin that resides ˜30nucleotides downstream (FIG. 22A). This architecture is similar to thatof several other riboswitches, some of which exhibit termination oftranscription in vitro upon addition of the corresponding ligand ascited above. Therefore, the lysC leader sequence appears to serve as aL-lysine-specific riboswitch that induces transcription termination bymodulating the formation of a terminator stem.

In vitro transcription assays were conducted in the absence and presenceof L-lysine and several analogs (FIG. 22B, left). In the absence ofadded ligand, single-round transcription in vitro using E. coli RNApolymerase produces terminated product corresponding to ˜36% of thetotal transcription yield. In contrast, the amount of terminated productincreases to ˜76% when 10 mM L-lysine is present during in vitrotranscription. Neither D-lysine nor L-ornithine induce termination,which is consistent with the fact that these compounds are notrecognized by the lysine aptamer domain and thus are not expected totrigger transcription termination.

The configuration of the expression platform for the lysC gene in B.subtilis strongly implicates a transcription termination mechanism,wherein the binding of L-lysine is expected to stabilize the P1 stem,thus permitting formation of the terminator hairpin (FIG. 22A). Thisproposed mechanism was examined by placing mutations within the criticalpairing elements and by assessing lysine-induced transcriptiontermination (FIG. 22B, center). Specifically, variant M1 carries twomutations that disrupt the formation of the terminator stem. Thisvariant loses lysine-dependent modulation of transcription termination,and produces greater transcriptional read-through relative to thewild-type construct. M2 carries a total of four mutations thatcompensate for the disruption of the terminator stem, but that causedisruption of the anti-terminator stem. This construct also loseslysine-dependent modulation, whereas the amount of the terminatedproduct expectedly becomes greater. Finally, the six-nucleotide variantM3 that carries the same mutations as M2 plus two additional mutationsto restore the anti-terminator base-pairing potential results in nearwild-type performance with regards to lysine-mediated modulation oftranscription termination. These findings are consistent with ariboswitch mechanism wherein lysine binding precludes formation of ananti-terminator stem, thus increasing transcription termination byformation of an intrinsic terminator structure.

v. Evidence that Riboswitches Serve as Antibiotics Targets

Unlike other lysine analogs, both L-lysine hydroxymate and theantimicrobial compound thiosine (S-(2-aminoethyl)-L-cysteine; FIG. 22A,inset) cause an increase in transcription termination (FIG. 22B, left).These two compounds exhibit the best apparent K_(D) values of any of theanalogs tested, with values for L-lysine hydroxymate and thiosine of˜100 μM and ˜30 μM, respectively (data not shown). In previous studies,a series of mutants were identified in B. subtilis (Vold, B., et al.,1975, J. Bacteriol. 121:970-974; Lu, Y., et al., 1992, FEMS Microbiol.Lett. 92:23-27) and E. coli (Patte, J.-C., et al., 1998, FEMS Microbiol.Lett. 169:165-170) that cause resistance to thiosine and causederepression of lysC expression. These mutations all map to the lysineaptamer domain (see FIG. 22A for select B. subtilis mutants), and allappear to cause disruptions in the conserved elements or thebase-pairing integrity of the structure.

The functional integrity of two thiosine-resistant mutants (G39A andG40A) was examined by equilibrium dialysis and by in line probing, andboth mutants fail to exhibit lysine-binding activity. Furthermore, RNAconstructs that carry mutations in the otherwise conserved P1-P2junction fail to undergo lysine-dependent transcription termination invitro (FIG. 22B, right). These findings suggest that the antimicrobialaction of thiosine might at least partially be due to direct binding ofthe analog to the lysine riboswitch, causing repression of aspartokinaseexpression to a level that is deleterious to cell growth.

The function of the wild-type 5′-UTR of lysC and of the twothiosine-resistant mutants were also examined in vivo by fusion to alacZ reporter gene. The wild-type riboswitch domain exhibitsligand-dependent modulation upon addition of L-lysine, whereas the G39Aand G40A mutants fail to regulate β-galactosidase expression (FIG. 22C,medium II versus III). In contrast, lysine hydroxymate fails to repressexpression of the reporter gene in vivo (medium IV), indicating thatthis compound might not attain a sufficiently high concentration insidecells to trigger transcription termination. As with lysine, thiosinealso represses β-galactosidase expression for the wild-type construct,but not the two derepression mutants (medium V). This latter observationis consistent with the antimicrobial action of thiosine being duelargely to its function as an effector for the lysine riboswitch.

3. Conclusions

The first mutants that caused deregulation of lysine biosynthesis in B.subtilis were identified nearly three decades ago (Vold, B., et al.,1975, J. Bacteriol. 121:970-974), however, the mechanism of geneticregulation has remained unresolved. Disclosed herein, it wasdemonstrated that the 5′-UTR of the lysC mRNA from B. subtilis serves asa riboswitch that responds to the amino acid lysine. The derepressedmutants isolated in the original study cause disruption of the aptamerdomain of the riboswitch, such that the ligand is no longer bound by theRNA. Furthermore, in vivo expression studies using mutant lysCfragment-reporter gene fusions indicate that these riboswitch mutationsmost likely cause unregulated over-expression of aspartokinase, whichcatalyzes the first step in the biosynthetic pathway to lysine andseveral other amino acids.

Bacteria use various mechanisms to respond genetically to amino acidconcentrations. Two of the more prominent mechanisms,translation-mediated transcription attenuation and T box-dependentmechanisms (Henkin, T. M., and Yanofsky, C. 2002, BioEssays 24:700-707),both sense the presence of non-aminoacylated tRNAs. Indeed, 18 of the 20common amino acids in B. subtilis appear to be detected indirectlythrough the use of T box elements. Interestingly, there is no knowntRNA^(lys)-dependent T-box in any organism, and presumably the lysineriboswitch described herein serves as the genetic sensor for this aminoacid in the absence of a corresponding T box. Moreover, the geneticdistribution of lysine riboswitches affiliated with the nhaC gene fromseveral organisms indicates that this RNA genetic element might be a keyregulator of cellular pH.

Since the lysC mRNA functions as receptor for L-lysine, the Lysriboswitch can serve as a drug target. (See other examples, Hesselberth,J. R., and Ellington, A. D. 2002, Nature Struct. Biol. 9:891-893;Sudarsan, N., et al., 2003, RNA 9:644-647). The lysine riboswitch, andperhaps other classes of riboswitches as well, can be targeted byanalogs that selectively bind to the riboswitch and induce geneticmodulation. In B. subtilis, an analog of lysine that triggers theriboswitch would be expected to function as an antimicrobial agent,because the reduction of aspartokinase expression should inducestarvation for lysine and other critical metabolites. The finding thatthiosine binds to the lysine aptamer in vitro, and causes downregulation of a reporter construct fused to the wild-type riboswitch,provides support for the view that riboswitches are a newly recognizedclass of targets for drug discovery.

Recent discoveries have been elucidating the roles of small RNAs inguiding gene expression in a wide range of organisms (for a review seeGottesman, S. 2002, Genes Dev. 16:2829-2842). It is apparent that smallRNAs, including riboswitch domains embedded within mRNAs, can controlgene expression by a wide range of mechanisms. Unlike other RNA geneticcontrol elements, riboswitches directly bind to metabolites and controlthe expression of genes that are involved in the import and biosynthesisof a number of fundamental metabolites. Riboswitches examined previouslyrespond to compounds that are chemically related to nucleotides.However, the existence of a class of riboswitches that responds to asmall amino acid with high selectivity serves as proof that natural RNAswitches can detect and respond to a greater range of metaboliteclasses.

F. Example 6 Guanine and Other Riboswitches in Bacillus subtilis andOther Bacteria

1. Summary

Riboswitches are metabolite-binding domains within certain messengerRNAs that serve as precision sensors for their corresponding targets.Allosteric rearrangement of mRNA structure is mediated by ligandbinding, and this results in modulation of gene expression. A class ofriboswitches that selectively recognizes guanine and becomes saturatedat concentrations as low as 5 nM are disclosed herein. In Bacillussubtilis, this mRNA motif is located on at least five separatetranscriptional units that together encode 17 genes that are mostlyinvolved in purine transport and purine nucleotide biosynthesis. Thesefindings provide further examples of mRNAs that sense metabolites andthat control gene expression without the need for protein factors.Furthermore, it is now apparent that riboswitches contribute to theregulation of numerous fundamental metabolic pathways in certainbacteria.

2. Introduction

It is widely understood that the interplay of protein factors andnucleic acids guide the complex regulatory networks for geneticexpression in modern cells. In most instances, protein factors appear tobe well-suited agents for maintaining genetic expression networks.Proteins can adopt complex shapes and carry out a variety of functionsthat permit living systems to sense accurately their chemical andphysical environments. Protein factors that respond to metabolitestypically act by binding DNA to modulate transcription initiation (e.g.the lac repressor protein; Matthews, K. S., and Nichols, J. C., 1998,Prog. Nucleic Acids Res. Mol. Biol. 58, 127-164) or by binding RNA tocontrol either transcription termination (e.g. the PyrR protein;Switzer, R. L., et al., 1999, Prog. Nucleic Acids Res. Mol. Biol. 62,329-367) or translation (e.g. the TRAP protein; Babitzke, P., andGollnick, P., 2001, J. Bacteriol. 183, 5795-5802). Protein factorsrespond to environmental stimuli by various mechanisms such asallosteric modulation or post-translational modification, and are adeptat exploiting these mechanisms to serve as highly responsive geneticswitches (e.g. see Ptashne, M., and Gann, A. (2002). Genes and Signals.Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.).

In addition to the widespread participation of protein factors ingenetic control, it is also known that RNA can take an active role ingenetic regulation. Recent studies have begun to reveal the substantialrole that small non-coding RNAs play in selectively targeting mRNAs fordestruction, which results in down-regulation of gene expression (e.g.see Hannon, G. J. 2002, Nature 418, 244-251 and references therein).This process of RNA interference takes advantage of the ability of shortRNAs to recognize the intended mRNA target selectively via Watson-Crickbase complementation, after which the bound mRNAs are destroyed by theaction of proteins. RNAs are ideal agents for molecular recognition inthis system because it is far easier to generate new target-specific RNAfactors through evolutionary processes than it would be to generateprotein factors with novel but highly specific RNA binding sites.

Many studies have now confirmed that the complex three-dimensionalshapes that some RNA molecules can mimic protein receptors andantibodies in their ability to selectively bind proteins or even smallmolecules (Gold, L., et al., 1995, Annu Rev. Biochem. 64, 763-797;Hermann, T., and Patel, D., 2000, Science 287, 820-825). Furthermore,RNAs exhibit sufficient structural complexity to permit the formation ofallosteric domains that undergo structural and functional modulationupon ligand binding (Soukup, G. A., and Breaker, R. R., 1999a, Proc.Natl. Acad. Sci. USA 96, 3584-3589; Seetharaman, S. et al., 2001, NatureBiotechnol. 19, 336-341). Natural RNAs also are capable of bindingnucleotides, as demonstrated by the group I self-splicing RNA, whichbinds guanosine or its phosphorylated derivatives (McConnell, T. S., etal., 1993, Proc. Natl. Acad. Sci. USA 90, 8362-8366). More recently,evidence has been provided which indicates that direct binding of ATP byan RNA is essential for packaging DNA into a viral capsid (Shu, D., andGuo, P., 2003, J. Biol. Chem. 278, 7119-7125.).

The known riboswitches bind their target metabolites with high affinityand precision, which are essential characteristics for any type ofmolecular switch that can permit accurate and sensitive genetic control.For example, a recently identified riboswitch that responds to thecoenzyme S-adenosylmethionine (SAM) binds it target with a dissociationconstant (K_(D)) of ˜4 nM (see Example 7). Furthermore, the riboswitchcan discriminate ˜100-fold against S-adenosylhomocysteine, which is anatural metabolite that differs from SAM by a single methyl group and anassociated positive charge. Disclosed herein (Example 1) genetic controlinvolving riboswitches is a widespread phenomenon with regard to itsbiological distribution and the target molecules that are beingmonitored. The observations that certain mRNAs from Archaeal organismscarry riboswitch-like domains (Stormo, G. D., and Ji., Y., 2001, Proc.Natl. Acad. Sci. USA 98, 9465-9467; Rodionov, D. A., et al., 2002, J.Biol. Chem. 277, 48949-48959) and that several mRNAs from fungi andplants bind thiamine pyrophosphate (TPP) (Sudarsan, N., et al., 2003,RNA 9:644-647).

The genetic regulation of purine transport and purine biosynthesispathways in bacteria, which are fundamental to the metabolic maintenanceof nucleotides and nucleic acids (Switzer, R. L., et al., 2002, A. L.Sonenshein, et al., eds., ASM Press, Washington, pp. 255-269), wereanalyzed for the presence of riboswitches. In B. subtilis, numerousgenes are involved in the biosynthesis of purines (pur operon with 12genes; Ebbole, D. J., and Zalkin, H. 1987, J. Biol. Chem. 262,8274-8287) and in the salvage of purine bases from degraded nucleicacids. The involvement of a regulatory protein factor has been proposedto participate in the control of the xpt-pbuX operon that encodes axanthine phosphoribosyltransferase and a xanthine-specific purinepermease, respectively (Christiansen, L. C., et al., 1997, J. Bacteriol.179, 2540-2550). Although the protein factor PurR is known to serve as arepressor of transcription in the presence of elevated adenineconcentrations (Weng, M., et al., 1995, Proc. Natl. Acad. Sci. USA 92,7455-7459), no protein with corresponding function has been identifiedin B. subtilis that responds to guanine.

Disclosed herein the xpt-pbuX operon is controlled by a riboswitch thatexhibits high affinity and high selectivity for guanine. This newfoundclass of riboswitches is present in the 5′-untranslated region (5′-UTR)of five transcriptional units in B. subtilis, including that of the12-gene pur operon. Thus, direct binding of guanine by mRNAs serves as acritical determinant of metabolic homeostasis for purine metabolism incertain bacteria. Furthermore, it was determined that the known classesof riboswitches, which respond to seven distinct target molecules,appear to control at least 68 genes in Bacillus subtilis that are offundamental importance to central metabolic pathways. These findingsindicate that riboswitches play a substantial role in metabolicregulation in living systems that direct interaction between smallmetabolites and RNA is a significant and widespread form of geneticregulation in bacteria.

3. Experimental Procedures

i. Chemicals and Oligonucleotides

Guanine and its analogs xanthine, hypoxanthine, adenine, guanosine,7-methylguanine, N²-methylguanine, 1-methylxanthine, 3-methylxanthine,8-methylxanthine, 2-aminopurine, 2,6-diaminopurine, allopurinol,2-amino-6-mercaptopurine, lumazine, and guanine-8-³H hydrochloride werepurchased from Sigma. Inosine, uric acid, 2-amino-6-bromopurine,O-methyl guanine and pterin were purchased from Aldrich.

DNA oligonucleotides were synthesized by the Keck FoundationBiotechnology Resource Center at Yale University, purified by denaturingPAGE and eluted from the gel by crush-soaking in 10 mM Tris-HCl (pH 7.5at 23° C.), 200 mM NaCl, and 1 mM EDTA. Oligonucleotides were recoveredfrom solution by precipitation with ethanol.

ii. Phylogenetic Analyses

G box domains were identified by sequence similarity to the xpt-pbuX5′-UTR by conducting a BLASTN search of Genbank using defaultparameters. These hits were expanded by searching for degenerate matchesto the pattern (<<<<[2] TA [6]<<<[2] ATNNGG [2]>>> [5] GTNTCTAC[3]<<<<<[3] CCNNNAA [3]>>>>>[5]>>>>) (SEQ ID NO:377). Angled bracketsindicate base pairing. Bracketed numbers are variable gaps withconstrained maximum lengths denoted. A total of four violations of thispattern were permitted when forming the phylogeny depicted in FIG. 23.It is important in this instance to note that only the BS3-xpt domain(that of the xpt-pbuX leader) has been shown to bind guanine. It wasdemonstrated that the molecular specificity of the VV1 representative isfor adenine and not guanine (unpublished data). Given the possibletrivial means by which a guanine-binding RNA aptamer might be altered tobind adenine (e.g. a C to U change if the C residue is used by theaptamer to make a Watson-Crick-pairing interaction with guanine), itcannot be ruled out that other representatives also have alteredmolecular recognition.

iii. In-Line Probing of RNA Constructs

The B. subtilis 201 xpt leader and truncated 93 xpt aptamer RNAs wereprepared by in vitro transcription using T7 RNA polymerase and theappropriate PCR DNA templates, and were subsequently 5′ ³²P-labeledusing a protocol similar to that described previously (Seetharaman, S.et al., 2001, Nature Biotechnol. 19, 336-341). Labeled precursor RNAs(˜2 nM) were subjected to in-line probing using conditions similar tothose described in Example 2. Reactions (10 μL) were incubated for 40 hrat 25° C. in a buffer containing 50 mM Tris (pH 8.5 at 25° C.), 20 mMMgCl₂ and 100 mM KCl in the presence or absence of purines as indicatedfor each experiment. Purine concentrations ranging from 1 nM to 10 μMwere typically employed but ranged as high as 300 μM for poor-bindingligands. Denaturing 10% PAGE was used to separate spontaneous cleavageproducts and a Molecular Dynamics PhosphorImager was used to view theresults. Quantitation of spontaneous cleavage yields was achieved byusing ImageQuaNT software. Since concentrations of RNA below 2 nM forin-line probing cannot be used easily due to insufficient levels ofsignal, apparent K_(D) values near this concentration reflect themaximum possible value.

iv. Equilibrium Dialysis

Equilibrium dialysis assays were conducted using a DispoEquilibriumDialyzer (ED-1, Harvard Bioscience), wherein two chambers a and b wereseparated by a 5,000 MWCO membrane. The final composition of bufferincluded 50 mM Tris-HCl (pH 8.5 at 25° C.), 20 mM MgCl₂ and 100 mM KCl(30 μL delivered to each chamber). Chamber a also contained 100 nM³H-guanine, while chamber b also contained 300 nM of xpt RNA constructsas indicated for each experiment. After 10 hr of equilibration at 25°C., a 5 μl aliquot from each chamber was removed for quantitation byliquid scintillation counter. When appropriate, an additional 5 μL ofbuffer was added to a and an equivalent volume of buffer containing 500nM unlabeled purine was added to b. After an additional 10 hr incubationat 25° C., 5 μl aliquots were again drawn for quantitation of tritiumdistribution.

v. Construction of xpt-lacZ Fusions

Genetic manipulations were conducted using approaches similar to thosedescribed elsewhere herein. Briefly, a DNA construct encompassing nt−121 to +197 relative to the transcription start site of the xpt-pbuXoperon from B. subtilis strain 1A40 (Bacillus Genetic Stock Center,Columbus, Ohio) was PCR amplified as an EcoR1-BamH1 fragment. Theproduct was cloned into pDG1661 at a site directly upstream of the lacZreporter gene. Mutants were created within the engineered pDG1661 byusing the appropriate primers and the QuickChange Site-directedmutagenesis kit (Stratagene). Plasmid variants were integrated into theamyE locus of strain 1A40. Transformants were selected forchloramphenicol (5 μml⁻¹) resistance and screened for sensitivity tospectinomycin (100 μg ml⁻¹). The integrity of each construct wasconfirmed by sequencing.

vi. Guanine-Mediated Modulation of β-Galactosidase Expression

B. subtilis cells were grown with shaking at 37° C. in minimal mediacontaining 0.4% w/v glucose, 20 g L⁻¹ (NH₄)₂SO₄, 25 g L⁻¹ K₂HPO₄.3H₂O, 6g L⁻¹ KH₂PO₄, 1 g L⁻¹ sodium citrate, 0.2 g L⁻¹ MgSO₄.7H₂O, 0.2%glutamate, 5 μg ml⁻¹ chloramphenicol, 50 μg ml⁻¹ L-tryptophan, 50 μgml⁻¹ L-lysine and 50 μg ml⁻¹ L-methionine. Purines were added at a finalconcentration of 0.5 μg ml⁻¹. Cells at mid exponential stage (A₅₉₅ of˜0.1) were harvested by centrifugation and resuspended in minimal mediain the absence or presence of a purine (0.5 mg mL⁻¹) as indicated foreach experiment. Although the poor solubility of guanine causes theformation of a detectable level of precipitate at this concentration, noadverse affects of cell growth were observed. Unless otherwisespecified, cells were incubated for an additional 3 hrs beforeperforming β-galactosidase assays. Data presented in FIG. 28C wasgenerated as described above with the exception that β-galactosidaseassays were performed at the times indicated.

4. Results and Discussion

i. A Conserved Domain in the 5′-UTR of Several B. subtilis mRNAs.

The xpt-pbuX operon is regulated by guanine, hypoxanthine, and xanthine.These purine compounds share chemical similarity and are adjacent toeach other in the pathways of purine salvage. In contrast to the puroperon, regulation of the xpt-pbuX operon remains unaffected by adeninein a strain wherein adenine deaminase is inactive (Christiansen, L. C.,et al., 1997, J. Bacteriol. 179, 2540-2550). These observations hadfostered speculation that an unidentified protein factor might beinvolved in guanine recognition (Ebbole, D. J., and Zalkin, H. 1987, J.Biol. Chem. 262, 8274-8287), however, such a genetic factor has not beenidentified. Moreover, the 5′-UTR of the xpt-pbuX mRNA is rather large(185 nucleotides), which could be sufficient to accommodate a riboswitchdomain.

Riboswitches are typically composed of two functional domains: anaptamer that selectively binds its target metabolite and an expressionplatform that responds to metabolite binding and controls geneexpression by allosteric means. The most conserved portion of knownriboswitches is the aptamer domain, whereas the adjoining expressionplatform can vary widely in sequence and in secondary structure. Thehigh sequence conservation of the aptamer is due to the fact that theRNA must retain its ability to form a receptor for a chemical that doesnot change through evolution. In contrast, the expression platform canform one of a great diversity of structures that permit genetic controlin response to ligand binding by the aptamer domain. This evolutionaryconservation was exploited to conduct a database search for xpt-pbuX5′-UTR sequences that are present in other B. subtilis genes and also inother bacterial species. Five transcriptional units within B. subtilisthat closely correspond in sequence and predicted secondary structurewith nucleotides 14 through 82 of the xpt-pbuX 5′-UTR (FIG. 23) wereidentified. A total of 32 representatives of this domain were identifiedamongst several Gram-positive and Gram-negative bacteria. Other memberscan exist as well.

From this representative set of RNAs, a consensus sequence and secondarystructure for the conserved RNA motif termed the “G box” (FIG. 24A) wereidentified. The secondary structure of the G box is composed of athree-stem (P1 through P3) junction, wherein significant sequenceconservation occurs within P1 and in the unpaired regions. Furthermore,it was found that stems P2 and P3 both favor seven base pairs in lengthwith one- or two-base mismatches permitted. This unusual conservation ofstem length implies that these structural elements establish distanceand orientation constraints of their stem-loop sequences relative to thethree-stem junction. Some base-pairing potential exists between the twostem-loop sequences, which might permit the formation of a pseudoknot.These characteristics indicate that G-box domains most likely useconserved secondary- and tertiary-structure elements to adopt a precisethree-dimensional fold.

ii. The G Box RNA from the xpt-pbuX 5′-UTR of B. subtilis Binds Guanine

Two RNA constructs based on the xpt-pbuX 5′-UTR of B. subtilis wereprepared to examine whether the mRNA selectively binds guanine or itsclosest analogs. A double-stranded DNA template corresponding to theentire 5′ UTR and the first four codons of the xpt-pbuX mRNA wasgenerated by PCR using primers that introduced a promoter sequence forT7 RNA polymerase and several nucleotide additions and mutations thatpermit further manipulation (FIG. 24B; see also ExperimentalProcedures). A truncated form of this construct also was created by PCRthat encompasses the 5′ half of the UTR. Upon transcription, the shorterDNA template generates a 93-nucleotide transcript termed 93 xpt, whilethe longer template produces a 201-nucleotide transcript termed 201 xpt.

These precursor RNAs were 5′ ³²P-labeled and subjected to an in-lineprobing assay (e.g. see Example 1) wherein the spontaneous cleavage ofRNA linkages within an aptamer is monitored in the presence and absenceof its corresponding ligand. It was found that the patterns ofspontaneous cleavage of the 93 xpt (FIG. 24C) and the 201 xpt (FIG. 25A)RNAs undergo significant alteration upon addition of guanine at aconcentration of 1 μM. Both hypoxanthine and xanthine also inducemodulation of spontaneous cleavage at this concentration. Specifically,four major regions exhibit ligand-mediated reduction in spontaneouscleavage (FIGS. 24B and 24C). However, the presence of 1 μM adenine (andas much as 1 mM) does not alter the pattern of RNA cleavage products.These results indicate that the G box domain in the 5′ UTR of the B.subtilis xpt-pbuX mRNA serves as an aptamer for guanine and relatedpurines, and that this aptamer undergoes significant structuralmodulation upon ligand binding. In the context of a riboswitch, thisallosteric function could be harnessed by the mRNA to modulatestructural elements that regulate gene expression.

In a preliminary assessment of the affinity that the guanine aptamer hasfor its target, in-line probing with 201 xpt in the presence of variousconcentrations of guanine was conducted. As expected, increasingconcentrations provided progressively decreasing amounts of spontaneouscleavage at the four major sites of structural modulation (FIG. 25A).Half-maximum levels of modulation were observed when a concentration of˜5 nM guanine is used for in-line probing (FIG. 25B). Although thisimplies that the K_(D) for 201 xpt under these conditions is ˜5 nM, itis important to note that the actual value might be somewhat lowerbecause of the limitations of the in-line probing assay (seeExperimental Procedures). In addition, the K_(D) was determined undernon-physiological conditions (e.g. high Mg²⁺ and elevated pH), and sothe binding affinity might be somewhat different in vivo. However, usingthis number for comparison, the affinity of the 201 xpt RNA for guanineis more than 10,000-fold greater than that of the Tetrahymena group Iribozyme for its guanosine monophosphate substrate (McConnell, T. S., etal., 1993, Proc. Natl. Acad. Sci. USA 90, 8362-8366). This differencemost likely reflects the relative differences in concentrations of thetwo compounds that the RNAs experience inside their respective cellularenvironments.

iii. The Guanine Aptamer Discriminates Against Many Purine Analogs

To maintain precise metabolic homeostasis, the cell must be able tosense the concentration of its target metabolite, but also must preventregulatory cross talk with other compounds that otherwise mightinadvertently trigger genetic modulation. Indeed, a hallmark of otherriboswitches is the ability to discriminate between closely relatedmetabolites. For example, the FMN and TPP riboswitches discriminateagainst the unphosphorylated coenzyme precursors thiamine and riboflavinby ˜1,000 fold (see Examples 2 and 3).

This requirement for obligate molecular discrimination against relatedmetabolites is expected to be extreme with guanine riboswitches, asthere are numerous purine nucleosides and nucleotides, purine bases, andpurine-like compounds that are present in the cell. Using the in-lineprobing strategy described in FIG. 25, the apparent K_(D) values of the93 xpt RNA were established for a variety of purines and purine analogs.Hypoxanthine and xanthine exhibit K_(D) values that are closest to thevalue determined for guanine, while adenine has a K_(D) value in excessof 300 μM (FIG. 26A). These results are consistent with the observationthat adenine does not significantly repress expression of the xpt-pbuXoperon as do the other purines (Christiansen, L. C., et al., 1997, J.Bacteriol. 179, 2540-2550). However, it is not clear whetherhypoxanthine and xanthine might repress gene expression by directlybinding a guanine riboswitch, or whether they might first be convertedinto guanine before influencing genetic control.

It was found that alteration of every functionalized position on theguanine heterocycle causes a substantial loss of binding affinity (FIG.26B, FIG. 27). For example, the oxygen atom at position 6 of guanine isa significant determinant of molecular recognition, as demonstrated bythe losses in apparent K_(D) for 2-aminopurine (>10,000-fold loss),2-amino-6-bromopurine (˜1,000 fold), and O⁶-methylguanine (>100 fold).Most molecular interactions could be explained by invokinghydrogen-bonding contacts between the RNA and guanine with the exceptionof the molecular interaction at C8. Here, presumably the RNA structurecreates a steric clash with analogs that carry additional bulk, such as8-methylxanthine (>10,000 fold) and uric acid (>10,000 fold).

A summary of the likely molecular recognition features that the guanineaptamer requires for maximum affinity is depicted in FIG. 26C. However,the likely possibility that significant binding affinity could bederived through base stacking was not examined. The presence of so manyproductive contacts between the RNA and all faces of guanine suggestthat the ligand is most likely entirely engulfed by the aptamer'sstructure. This would also explain why the RNA is capable of generatingrecognition via steric occlusion of bulkier compounds such as uric acid.In certain biological environments, for example, uric acid can build upto high concentrations that permit crystallization. In suchenvironments, a bacterium would require a high level of discriminationto prevent undesirable repression of guanine-regulated genes. In lightof such molecular recognition challenges, it is not surprising that anRNA genetic switch would evolve extensive molecular contacts with itstarget compound.

iv. Confirmation of Guanine Aptamer Function by Equilibrium Dialysis

Equilibrium dialysis was used to provide further evidence that the G boxRNA from the xpt-pbuX operon binds guanine preferentially over otherpurines and purine analogs. A substantial shift in tritiated guanine isexpected to occur in a two-chamber dialysis apparatus when an excess offunctional RNA is added to one chamber (FIG. 27A). Furthermore, thisshifted equilibrium should return to unity upon addition of an excess ofunlabeled competitor ligand. As expected, it was observed that greaterthan 90% of tritiated guanine co-localizes with 93 xpt RNA, andsubsequently redistributes when an excess of unlabeled guanine isintroduced. In contrast, the presence of excess unlabeled analogs has noeffect on co-localization of ³H-guanine and the RNA (FIG. 27B). Even thenucleoside guanosine (9-ribosylguanine) fails to restore equaldistribution of guanine between the two chambers, which is consistentwith the RNA folding to form a tight pocket for the base alone.

Both in-line probing and equilibrium dialysis data indicate that thisnatural aptamer binds guanine with high affinity and specificity. In aprevious study, in vitro evolution was used to isolate a purine-bindingaptamer from a pool of random-sequence RNAs (Kiga, D., et al., 1998,Nucleic Acids Res. 26, 1755-1760). This engineered aptamer exhibits aK_(D) of 1.3 μM for guanine, and shows only a 2- to 3-folddiscrimination against hypoxanthine and xanthine. The lower specificityand affinity of this aptamer for selected purines is due to the factthat only the N1, N7 and O6 positions are important for molecularrecognition. In contrast, the G box RNA appears to make productivecontacts with all available functional groups on guanine, presumablythrough hydrogen bonding (FIG. 26C).

v. Aptamer Mutations Affect Guanine Binding and Genetic Control

A variety of mutations were introduced into the G box domain to examinethe importance of several structural elements and conserved nucleotides(FIG. 28A). The influence of these mutations on guanine binding wasdetermined in the context of the 93 xpt RNA by using equilibriumdialysis. Mutations that independently disrupt the three stems (M1, M4and M6) cause a loss of binding function, as does a variant RNA (M3)that carries two mutations in the central junction (FIG. 28B). Incontrast, the effects of the disruptive stem mutations are largelyreversed by making compensatory mutations (M2, M5 and M7) that restorebase pairing. These results are consistent with the phylogeneticanalysis (FIG. 23), which indicates that stem structure is important butthat the precise sequence composition of these elements is of lessimportance.

Binding function of variant aptamers in vitro also correlates withgenetic control in vivo. The results disclosed herein confirmed earlierfindings that a reporter gene carrying the 5′-UTR of the xpt-pbuX mRNAis repressed by guanine, and to a lesser extent by hypoxanthine andxanthine (Christiansen, L. C., et al., 1997, J. Bacteriol. 179,2540-2550). Specifically, transcriptional fusions were created between aβ-galactosidase reporter gene and variant xpt-pbuX 5′-UTR sequencescarrying the mutations described in FIG. 28A. B. subtilis chromosomaltransformants using the wild-type sequence exhibit the expected levelsof genetic modulation (FIG. 28C). Although the xtp aptamer exhibitsdissociation constants for xanthine and hypoxanthine that areessentially identical in vitro, the differences in genetic modulation bythese compounds in vivo might be due to differences in their cellularconcentrations.

Aptamer variants with impaired guanine binding in vitro also exhibit aloss of β-galactosidase repression (FIG. 28D). Furthermore, restorationof base pairing in stems P1 through P3 results in restored geneticcontrol. The M2 variant is of particular interest because it not onlyexhibits restored genetic control, but also provides modest expressionof β-galactosidase in the absence of guanine Riboswitch functionrequires the action of an aptamer for molecular sensing as well as anexpression platform that transduces RNA-ligand complex formation into agenetic response. Examples of TPP and FMN riboswitches (see Examples 2and 3) appear to function by differential formation of terminator andantiterminator structures. Such ligand-induced formation oftranscription anti-termination structures also appears to be the basisof expression platform mechanisms used by numerous SAM riboswitches (seeExample 7). Construct M2 carries three mutations within the putativeanti-terminator structure of the xpt-pbuX leader, and thus is expectedto exhibit an overall reduction of reporter gene expression becausethese mutations should bias structure folding towards terminator stemformation.

The results of these mutational and functional analyses confirm themajor features of the secondary structure model (P1 though P3) anddemonstrate that they are critical for metabolite binding. Furthermore,the correlation between ligand binding and genetic control indicatesthat the G box and adjacent nucleotides of the xpt-pbuX leader sequenceoperate in concert to function as a guanine-dependent riboswitch, mostlikely by operating via allosteric control of transcription termination.

vi. Riboswitches Control Fundamental Biochemical Pathways

Our findings indicate that the G box RNA of the xpt-pbuX operon is a keystructural element of a guanine-sensing riboswitch that exhibitsextraordinary affinity and selectivity for its target. In B. subtilis,this general riboswitch motif appears to control at least fivetranscriptional units (FIG. 23). Although the precise function ofseveral of the gene products in this newly identified regulon have notbeen clearly defined, the known genes from B. subtilis and from otherorganisms are mostly related to purine metabolism. Based on the resultsdisclosed herein, it is likely the G box domain within the 5′-UTR ofthis large pur operon is responsible for guanine-dependent riboswitchregulation, and that the genetic regulatory mechanism might be similarto that proposed herein for the xpt-pbuX operon.

The distribution of G box domains in B. subtilis and other bacteriasuggests that this class of metabolite-binding RNAs controls a regulonthat is essential for cell survival. In B. subtilis, guanineriboswitches (or related adenine-dependent riboswitches—see the legendto FIG. 23) appear to provide at least some contribution to the geneticregulation of 17 genes. The discovery of guanine-dependent riboswitchesadds to a growing list of similar metabolite-sensing RNAs. For example,a class of riboswitches that responds to SAM (McDaniel, B. A. M., etal., 2003, Proc. Natl. Acad. Sci. USA 100, 3083-3088; Epshtein, V., etal., 2003, Proc. Natl. Acad. Sci. USA 100, 5052-5056) controls a regulonof as many as 26 genes that are involved in coenzyme biosynthesis, aminoacid metabolism, and sulfur metabolism. When included with genes thatare controlled by other riboswitch classes, at least 68 genes (nearly 2%of its total genetic complement) are under riboswitch control (FIG. 29).

Riboswitches for ligands such as guanine and SAM apparently are servingas master control molecules whose concentrations are being monitored toensure homeostasis of a much wider set of metabolic pathways.Riboswitches also seem to permit metabolite surveillance and geneticcontrol with the same level of precision and efficiency as thatexhibited by protein factors. Therefore, these RNA switches could haveemerged late in the evolution of modern biochemical architecturesbecause they are functionally comparable to genetic switches made ofprotein. However, given their fundamental role in metabolic maintenanceand the widespread phylogenetic distribution of certain riboswitches, itis consistent that aptamer domains similar to these might have been theprimary mechanism by which RNA-world organisms detected metabolites andcontrolled biochemical pathways before the emergence of proteins.

5. Conclusions

This demonstration that guanine is sensed by metabolite-binding mRNAsexpands the known classes of riboswitches, and provides additionalevidence that certain bacterial RNAs are responsible for monitoring theconcentrations of critical coenzymes and other compounds that arefundamental to all living systems. Phylogenetic analyses and biochemicaldata indicate that many bacteria and, in some instances, eukaryotes(Sudarsan, N., et al., 2003, RNA 9:644-647) entrust riboswitches tosense essential metabolites and mediate genetic control. Althoughprotein factors undoubtedly could be used to carry out these importantregulatory tasks, based on the disclosure herein, highly structured RNAsare well suited for this role. If RNA polymers were a poorly suitedmedium for generating metabolite receptors with high affinity andprecision, then one would expect that evolution would have long agoreplaced them by protein factors.

Disclosed herein it is consistent (e.g. see Examples 1 and 2) thatriboswitches are derivatives of an ancient genetic control system thatmonitored metabolic and environmental signals before the evolutionaryemergence of proteins. Interestingly, each of the metabolite targets ofriboswitches has been proposed to come from an RNA world (White, H. B.III., 1976, J. Mol. Evol. 7, 101-104; Benner, S. A., et al., 1989, Proc.Natl. Acad. Sci. USA 86, 7054-7058; Jeffares, D. C., et al., 1998, J.Mol. Evol. 46, 18-36; Jadhav, V. R., and Yarus, M., 2002, Biochimie 84,877-888). The identification of guanine as a trigger for riboswitches isconsistent with metabolite sensing RNAs having originated very early inevolution. Also disclosed herein is another class of riboswitches thatresponds to the amino acid lysine (FIG. 29). Although all riboswitchescould be more recent evolutionary inventions, even the origin of thelysine riboswitch might date from before the last common ancestor andback to a time when living systems were transitioning from a pure RNAworld to a more modern metabolic state that made use of encoded proteinsynthesis.

G. Example 7 S-Adenosylmethionine Riboswitches

Riboswitches are metabolite-binding RNA structures that serve as geneticcontrol elements for certain messenger RNAs. These RNA switches havebeen identified in all three kingdoms of life and are typicallyresponsible for the control of genes whose protein products are involvedin the biosynthesis, transport, or utilization of the target metabolite.Disclosed herein, is a highly conserved RNA domain found in bacteriaserves as a riboswitch that responds to the coenzymeS-adenosylmethionine (SAM) with remarkably high affinity andspecificity. SAM riboswitches undergo structural reorganization uponintroduction of SAM, and these allosteric changes regulate theexpression of 26 genes in Bacillus subtilis. This and related findingsindicate that direct interaction between small metabolites andallosteric mRNAs is a significant and widespread form of geneticregulation in bacteria.

1. Results

i. Identification of a SAM-Responsive Riboswitch

Each of the compounds sensed by previously identified riboswitches(coenzyme B₁₂, TPP, FMN) is used as a coenzyme by modern proteinenzymes. Interestingly, these coenzymes have significant structuralsimilarity to RNA, which has been used to support speculation that theymight also have been used as coenzymes by ancient ribozymes in an RNAworld (S. A. Benner, et al., Proc. Natl. Acad. Sci. USA 86, 7054 (1989);H. B. White III, J. Mol. Evol. 7, 101 (1976); D. C. Jeffares, et al., J.Mol. Evol. 46, 18 (1998). If modern riboswitches are direct descendentsof RNA control systems that originated in the RNA world, then themetabolites they sense and the metabolic pathways that they control willbe of fundamental importance to modern biochemical processes. To furtherassess this hypothesis, a search for additional riboswitches, todetermine their biochemical characteristics, and to establish their rolein genetic control on a genome-wide level was performed.

In this effort the S box was examined (F. J. Grundy, T. M. Henkin, Mol.Microbiol. 30, 737 (1998)), which is a highly conserved sequence domain(FIG. 30A) that is located within the 5′-untranslated region (5′-UTR) ofcertain messenger RNAs in Gram-positive bacteria. Both genetic andsequence analyses suggest that the S box domain serves as a geneticcontrol element for a regulon composed of 11 transcriptional units.These mRNAs encode as many as 26 different genes in B. subtilis that areinvolved in sulfur metabolism, methionine biosynthesis, cysteinebiosynthesis, and SAM biosynthesis. However, the nature of the putativeregulatory factor and the metabolite to which it responds had not beenestablished (T. M. Henkin, Curr. Opin. Microbiol. 3, 149 (2000); F. J.Grundy, T. M. Henkin, Frontiers Biosci. 8, D20 (2003)). An RNA constructcorresponding to the first 251 nucleotides of the yitJ mRNA of B.subtilis (FIG. 30 b) was prepared by in vitro transcription (G. A.Soukup, R. R. Breaker, RNA 5, 1308 (1999)). The yitJ gene product is aputative methylene tetrahydrofolate reductase—an enzyme proposed to beinvolved in methionine biosynthesis (F. J. Grundy, T. M. Henkin, Mol.Microbiol. 30, 737 (1998). The 251 yitJ RNA was subjected to “in-lineprobing”, which reveals locations of structured and unstructuredportions of RNA polymers by relying on the variability in rates ofspontaneous RNA phosphodiester cleavage caused by differences instructural context. In-line probing can also reveal nucleotidesparticipating in metabolite-induced structural modulation (see Examples1-3).

Whether the 251 yitJ RNA might bind S-adenosylmethionine (SAM) wasanalyzed. Indeed, upon separation by polyacrylamide gel electrophoresis(PAGE), the pattern of spontaneous RNA cleavage products (FIG. 30 c) wasindicative of a highly structured RNA element that undergoesconformational modulation upon introduction of SAM to a finalconcentration of either 0.1 mM or 1 mM. In contrast, no structuralmodulation was evident upon the introduction of methionine at the sameconcentrations, suggesting that the RNA might require both themethionine and 5′-deoxyadenosyl moieties of SAM to induce structuralreorganization. The locations of the ligand-induced modulations (FIG. 30b) indicated that the conserved core of the S box RNA serves as anatural aptamer (L. Gold, et al., Annu. Rev. Biochem. 64, 763 (1995)).for SAM. Similar results were observed with 124 yitJ, which encompassesnucleotides 28 through 149 of the mRNA leader plus two G residues at the5′ terminus.

ii. Molecular Recognition by a SAM-Dependent Riboswitch

A genetic switch that responds to metabolites must be able to bind itstarget with a dissociation constant (K_(D)) that is relevant tophysiological concentrations. Furthermore, the metabolite receptor mustbe able to discriminate precisely against closely related compounds thatare likely to occur in the same milieu, or risk undesirable modulationof gene expression. Therefore, the affinity of the yitJ RNA for SAM wasassessed, and the ability of the RNA to discriminate againstbiologically relevant compounds that are structurally similar to thistarget (FIG. 31 a).

The K_(D) of 251 yitJ for SAM was determined by using in-line probing tomonitor the extent of structural modulation over a range of ligandconcentrations (FIG. 31 b, left). Although the K_(D) of 251 yitJ for SAMis ˜200 nM, the minimized aptamer domain represented by 124 yitJexhibits a K_(D) of ˜4 nM under the disclosed assay conditions. Suchimprovements in binding affinity by minimized aptamer domains have beenobserved (see Example 2). This most likely reflects greater structuralpreorganization of the ligand binding form of the aptamer domain due tothe elimination of the adjoining expression platform, which otherwisewould permit alternative folding to occur. Tight binding was alsoobserved when the 124 yitJ was interrogated by using a Scatchardanalysis with tritiated SAM. The assessment of binding affinityindicated that the K_(D) for the 124 yitJ aptamer is more than 1000-foldimproved compared to that reported recently for a related RNA (McDaniel,B. et al., Proc. Natl. Acad. Sci. USA 100, 3083-3088 (2003)). Normalconcentrations of SAM in bacteria are typically in the low micromolarrange (McDaniel, B. et al., Proc. Natl. Acad. Sci. USA 100, 3083-3088(2003)), however, most of this coenzyme pool is probably bound byenzymes. Therefore the low K_(D) exhibited by this riboswitch might beneeded to sense the concentration of free SAM.

As expected, the 124 yitJ RNA achieves a high level of moleculardiscrimination against analogs of SAM. For example, the RNA exhibits˜100-fold discrimination against SAH (FIG. 31 b, right), which isproduced upon utilization of SAM as a coenzyme for methylation reactions(F. Takusagawa, et al., In: Comprehensive Biological Catalysis, M.Sinnott, ed., Academic Press, Vol. 1, pp. 1-30 (1998)). Thus, theaptamer must form a binding pocket for SAM that can sense the absence ofa single methyl group and an associated loss of positive charge.Similarly, the RNA discriminates nearly 10,000 fold against SAC, whichis another biological compound that differs from SAH by the absence of asingle methylene group. This pattern of molecular discrimination wasconfirmed by using equilibrium dialysis (FIG. 31 c).

iii. SAM Binding by an mRNA is Required for Genetic Regulation

The secondary structure model for the SAM-binding aptamer domain wasestablished using phylogenetic data (F. J. Grundy, T. M. Henkin, Mol.Microbiol. 30, 737 (1998)). To provide further support for this model,the influence of disruptive and compensatory mutations (FIG. 32 a) onthe binding function of the 124 yitJ RNA, and on SAM-mediated geneticcontrol of a lacZ reporter gene when fused with variant riboswitchesbased on these mutant aptamers was examined. Mutations that alter theconserved core of the aptamer (M1) or that disrupt base pairing in eachof the four major base-paired regions (M2, M4, M6 and M8) largely resultin a loss of SAM binding function as determined by equilibrium dialysis(FIG. 32 b). Compensatory mutations that restore base pairing in thesestems (M3, M5, M7, M9) restore at least partial binding activity.

It has been shown (F. J. Grundy, T. M. Henkin, Mol. Microbiol. 30, 737(1998)) that a growth medium rich in methionine leads to repression ofB. subtilis genes that carry the S box domain. This is most likely dueto the ability of the cell to convert methionine into an ample supply ofSAM. Disclosed herein in all cases tested, the binding function of themutant correlates with their ability to down regulate an appendedreporter gene when presented with excess methionine in otherwise minimalgrowth media (FIG. 32 c). These findings are consistent with SAM bindingto the mRNA being necessary for the genetic regulation of S box mRNAs.

iv. SAM Riboswitches Control Gene Expression by TranscriptionTermination in B. subtilis

Disclosed herein bacterial riboswitches can control gene expression bymodulating either transcription termination or translation initiation(see Examples 2 and 3), while several putative riboswitches ineukaryotes might use one of several different mechanisms. In B.subtilis, the SAM-binding aptamer domains typically reside immediatelyupstream from a putative transcription terminator hairpin (F. J. Grundy,T. M. Henkin, Mol. Microbiol. 30, 737 (1998)), which implies that SAMbinding most likely induces transcription termination as describedpreviously for FMN- and TPP-dependent riboswitches (see Example 3).

In vitro transcription in the absence or presence of SAM using 11 DNAtemplates corresponding to the mRNA leader sequences of the S boxregulon was performed. These assays were simplified by using T7 RNApolymerase instead of the native B. subtilis RNA polymerase. It wasobserved that an FMN-dependent riboswitch induces transcriptiontermination even when T7 RNA polymerase is used as a surrogate for thebacterial polymerase (see Example 3). In this study, it was found thatthe yitJ, yoaD and metK leader constructs exhibit modest transcriptiontermination upon the addition of SAM. More dramatically, the terminationproduct from the metI leader construct increases from ˜12% to nearly 75%upon introduction of SAM (FIG. 33 a). In all instances, little or nomodulation of transcription termination occurs when the analogs SAH orSAC are added to the reaction. The remaining seven S-box representativesdid not exhibit significant modulation with T7 RNA polymerase,presumably because it serves as an imperfect substitute for the nativepolymerase. Indeed, SAM-dependent transcription termination is observedwith many of these mRNA leader sequences when E. coli or B. subtilispolymerases are used in the assay (McDaniel, B. et al., Proc. Natl.Acad. Sci. USA 100, 3083-3088 (2003)).

The mechanism of SAM-induced termination (FIG. 33 b) most likelyinvolves the ligand-mediated formation of alternative hairpin structuresthat permit transcriptional read-through (anti-terminator formationwithout SAM) or that cause termination (terminator formation with SAM).This mechanism was examined by generating several mutant metI constructsthat carry disruptive or compensatory changes in the expression platform(FIG. 33 b). SAM causes an additional ˜20% yield in transcriptiontermination in a mutant (Mabc) that carries six mutations relative tothe wild-type metI riboswitch, which retains proper terminator andanti-terminator base complementation. However, incomplete representationof these six mutations that do not permit normal pairing interactions tooccur permits little or no SAM-mediated transcription modulation.Furthermore, mutations that disrupt terminator stem formation (Ma) yieldlower levels of termination, while mutations that disruptanti-terminator stem formation (Mab, Mc) yield higher levels oftermination (FIG. 33 b). These findings indicate that the RNA structuralmodulation induced by SAM binding mediates genetic control bysequestering an anti-terminator sequence, and thus favors the formationof a transcriptional terminator hairpin.

v. Riboswitches Control Multiple Genes that are Involved in FundamentalBiochemical Pathways

The 11 transcriptional units that comprise the regulon controlled by SAMriboswitches (F. J. Grundy, T. M. Henkin, Mol. Microbiol. 30, 737(1998)) appear to encompass at least 26 genes that are central to sulfurmetabolism, amino acid metabolism, and SAM biosynthesis. Although all 11transcriptional units from B. subtilis carry a consensus S box element,a recent report indicates that gene expression from one of these (cysH)is not modulated by addition of methionine to the medium, as are other Sbox RNAs (M. C. Mansilla, et al., J. Bacteriol. 182, 5885 (2000)). Theaptamer domain from B. subtilis cysH does bind SAM with an affinity thatis more than 2 orders of magnitude poorer than that of yitJ from thesame organism (FIG. 34 a). However, the cysH homolog from B. anthracisexhibits a K_(D) that matches that of yitJ (FIG. 34 b), implying thatthe B. subtilis cysH aptamer has suffered one or more mutations thathave somewhat degraded binding affinity.

2. Conclusion

Current biochemical and bioinformatics data indicate that B. subtilishas at least 68 genes (nearly 2% of its total genetic complement) underriboswitch control. Moreover, each of these mRNAs is responding tobiological compounds that are universal in biology. The fact thatgenetic control elements for fundamental metabolic processes are formedby RNA indicates that this polymer has the structural sophisticationneeded to precisely monitor chemical environments and transducemetabolite binding events into genetic responses. A more detailedanalysis of riboswitch structures at the atomic level would be of greatutility in determining how metabolite binding promotes allostericreorganization RNA genetic switches.

Riboswitches for ligands such as SAM and guanine appear to be serving asmaster control molecules whose concentrations are being monitored toensure homeostasis of a much wider set of metabolic pathways.Riboswitches seem to permit metabolite surveillance and genetic controlwith the same level of precision and efficiency as that exhibited byprotein factors, and thus could have emerged late in the evolution ofmodern biochemical architectures.

3. Methods

i. DNA Oligonucleotides and Chemicals

Synthetic DNAs were purchased from The Keck Foundation BiotechnologyResource Center at Yale University. Preparation of RNAs by in vitrotranscription was conducted (Seetharaman, S., et al., Nat. Biotechnol.19, 336-341 (2001)) and the products were purified as described inExample 2. SAM, various analogs of SAM, andS-adenosyl-L-methionine-methyl-³H (³H-SAM) were purchased from Sigma.

ii. DNA Constructs

A yitJ DNA construct encompassing nucleotides −380 to +15 relative tothe translation start site was prepared using primers that generatedEcoRI and BamH1 restriction sites upon PCR amplification of B. subtilischromosomal DNA (strain 168). The product was cloned into pDG1661 (ref26; Bacillus Genetic Stock Center, Columbus, Ohio) using theserestriction sites, which places the riboswitch immediately upstream ofthe lacZ reporter gene. Mutants were created by using the appropriatemutagenic primers and the QuickChange site-directed mutagenesis kit(Stratagene). All sequences were confirmed by sequencing.

iii. In Vivo Analysis of Riboswitch Function

B. subtilis strain 1A234 was obtained from the Bacillus Genetic StockCenter, Columbus, Ohio. Cells were grown with shaking at 37° C. eitherin rich media (2XYT broth or tryptose blood agar base) or defined media(0.5% w/v glucose, 20 g L⁻¹ (NH₄)₂SO₄, 183 g L⁻¹ K₂HPO₄.3H₂O, 60 g L⁻¹KH₂PO₄, 10 g L⁻¹ sodium citrate, 2 g L⁻¹ MgSO₄.7H₂O, 5 μM MgCl₂, 50 μgL⁻¹ tryptophan, and 50 μg L⁻¹ glutamate, Methionine Was added to 50 μgL⁻¹ for routine growth. Growth under methionine-limiting conditions wasestablished by incubation under routine growth conditions to an A₅₉₅ of0.1, at which time the cells were pelleted by centrifugation,resuspended in minimal media, split into two aliquots, and supplementedwith either 50 μg L⁻¹ (+methionine) or 0.75 μg L⁻¹ (−methionine) (FIG.32 c). Cultures were incubated for an additional 3 hr before performingβ-galactosidase assays. Transformations of pDG1661 variants (see DNAconstructs) into B. subtilis were performed as described elsewhere (H.Jarmer, et al., FEMS Microbiol. Lett. 206, 197 (2002)). The correcttransformants were identified by selecting for chloramphenicol (5 μgmL⁻¹) resistance and screening for spectinomycin (100 μg mL⁻¹)sensitivity. Proper site-specific genomic insertion by double cross-overrecombination was confirmed by PCR using amyE-specific primers.

iv. In Vitro Transcription Termination Assays

Transcription reactions (10 μL) containing ˜30 pmoles of specifictemplate DNA, 200 μM each NTP, 5 μCi [α-³²P]UTP (1 Ci=37 GBq) and 50units of T7 RNA polymerase (New England Biolabs) were incubated in thepresence of 50 mM Tris-HCl (pH 7.5 at 23° C.), 15 mM MgCl₂, 2 mMspermidine, 5 mM DTT at 37° C. for 2 hr. SAM and its analogs were addedto a final concentration of 50 μM. Transcription templates weregenerated for all 11 riboswitch domains in the S box regulon of B.subtilis by using PCR with corresponding primers that in each caseproduced transcripts beginning with GG, encompassing the putativenatural transcription start (F. J. Grundy, T. M. Henkin, Mol. Microbiol.30, 737 (1998)), and including the first 13 codons of the adjoining openreading frame. Transcription products were separated by denaturing 6%PAGE and visualized by PhosphorImager. Termination yields wereapproximated by determining the ratio of RNAs in the termination bandrelative to the combined terminated and full-length RNAs.

H. Example 8 Adenine Riboswitches

A class of riboswitches that recognizes guanine and discriminatesagainst most other purine analogs was recently identified (see Example6). Representative RNAs that carry the consensus sequence and structuralfeatures of guanine riboswitches are located in the 5′-untranslatedregion (UTR) of numerous genes of prokaryotes, where they controlexpression of proteins involved in purine salvage and biosynthesis. Thisexample shows that three representatives of this phylogenetic collectionbind adenine with values for apparent dissociation constant (apparentK_(D)) that are several orders of magnitude better than for guanine. Thepreference for adenine is due to a single nucleotide substitution in thecore of the riboswitch, wherein each representative most likelyrecognizes its corresponding ligand by forming a Watson/Crick base pair.In addition, the adenine-specific riboswitch associated with the ydhLgene of Bacillus subtilis functions as a genetic ‘ON’ switch, whereinadenine binding causes a structural rearrangement that precludesformation of an intrinsic transcription terminator stem.

Guanine-sensing riboswitches are a class of RNA genetic control elementsthat modulate gene expression in response to changing concentrations ofthis compound (see Example 6). This is one of a number of classes ofmetabolite-binding riboswitches that regulate gene expression inresponse to various fundamental compounds such as lysine and thecoenzymes FMN, SAM, B₁₂ and TPP (thiamin pyrophosphate) (see Example 6).Typically, each riboswitch is composed of two functional domains, anaptamer and an expression platform, that function together as atransducer of chemical signals into altered patterns of gene expression.The aptamer serves as a specific receptor for the target metabolite,wherein ligand binding brings about allosteric changes in both theaptamer and expression platform domains.

Detailed examinations of the ligand specificities for the naturalaptamers from guanine- and lysine-specific riboswitches have beenconducted (see Example 6), and less comprehensive examinations of theFMN, SAM, B₁₂ and TPP aptamers have been conducted (see Examples 1-3).In each case, the RNAs exhibit high levels of molecular discriminationby disfavoring the binding of even closely related metabolite analogs.This characteristic of high molecular discrimination is a hallmark ofenzymes and receptors, including genetic regulatory factors, which needto carry out biological processes with great precision in the presenceof complex chemical mixtures.

The molecular recognition characteristics of guanine riboswitches aredistinguished by the fact that nearly every position around the purineheterocycle appears to be critical for high affinity binding by theaptamer. Thus, the arrangement of the binding pocket permits theriboswitch to control gene expression in response to changing guanineconcentrations, but prevents modulation of gene expression in responseto increasing concentrations of adenine (see Example 6; Cristiansen, L.C., et al., J. Bacteriol. 179, 2540-1550 (1997)). However, it is likelythat receptors made of RNA, like their protein counterparts, couldacquire altered molecular recognition characteristics as a result ofnatural selection. This would permit riboswitches to emerge throughevolution that selectively sense and respond to metabolites that areproximal in metabolic pathways.

This example confirms the existence of a variant class of riboswitchesthat responds to adenine. These riboswitches carry an aptamer domainthat corresponds closely in sequence and secondary structure to theguanine aptamer described recently (see Example 6). However, eachrepresentative of the adenine sub-class of riboswitches carries a C to Umutation in the conserved core of the aptamer, indicating that thisresidue is involved in metabolite recognition. The results indicate thatthe identity of this single nucleotide determines the bindingspecificity between guanine and adenine, which provides an example ofhow complex riboswitch structures could mutate to recognize newmetabolite targets.

1. Results

i. Phylogenetic Comparison Between Riboswitch Domains

A comparative sequence strategy was used to identify a series ofintergenic regions from a number of prokaryotic species that carry aconserved sequence element termed the “G box” (see Example 6). B.subtilis carries at least five of these motifs, which were alsoidentified using genetics techniques (Johansen, L. E., et al., J.Bacteriol. 185, 5200-5209). Each representative of the phylogeny hasthree potential base-paired elements (P1 through P3) and as many as 24nucleotides that are conserved in greater than 90% of the examplesidentified to date. A subset of this phylogeny with features common tothe G box motif highlighted is presented herein (FIG. 35A). Whenselected representatives are examined in greater detail, they areencompassed by the mRNA transcript of the gene immediately downstream,and thus are present as RNA elements located in the 5′-UTR of certainmRNAs.

Several notable differences present in the guanine-binding domain of xpt(FIG. 35B) relative to the RNA from ydhL (FIG. 35C) were identified.First, among the 23 sequence variations in ydhL compared to xpt, 20reside within base-paired elements and most of these changes permit basepairing to be retained. This strongly indicates that the overallsecondary structure between the two RNAs is similar. Second, theremaining three mutations reside in unpaired regions, such that two(corresponding to positions 31 and 48 relative to xpt) reside atlocations that are known to be variable. These mutations do not impactsignificantly the structure and function of the RNA. Third, theremaining mutation is a C to U change at position 74 relative to xpt,which otherwise corresponds to a strictly conserved nucleotide of thethree-stem junction. Given the location of this mutation, this changemight alter the molecular recognition characteristics of the ydhLaptamer.

ii. Variant G Box RNAs Selectively Bind Adenine

It had been established (see Example 6) that the xpt aptamer makesnumerous contacts with its ligand, and that as many as seven hydrogenbonds might be involved in forming the RNA-ligand complex. Furthermore,there is evidence that steric clashes also likely aid in restricting therange of metabolites that can be bound by the RNA. This array ofcontacts can only be established by forming multiple interactionsbetween the various sides of guanine and distal parts of the RNA.

An intriguing hypothesis is the possibility that the C residue atposition 74 of xpt could conceivably be forming a Watson/Crick base pairwith guanine, thus forming three of these hydrogen bonds. Since a Umutation resides in the corresponding position in B. subtilis ydhL andtwo RNAs from C. perfringens and V. vulnificus, we believe that theseRNAs might serve as adenine-responsive riboswitches. This hypothesis wasfurther supported by recognition that the latter two genes (add) encodeadenine deaminase enzymes. It seems reasonable that adenine should bethe metabolite whose concentration is being monitored to determine theexpression levels of adenine deaminase.

The ligand specificity of five G box RNAs (FIG. 35A) was examined byusing in-line probing (. Soukup, G. A. & Breaker, R. R. RNA 5, 1308-1325(1999); Soukup, G. A., DeRose, E. C., RNA 7, 524-536 (2001)). In thisassay, the spontaneous cleavage of RNA is monitored in the absence ofligand, or in the presence of guanine or adenine. As predictedpreviously (see Example 6), the purE RNA (FIG. 36A) exhibits changes inthe pattern of spontaneous cleavage products in the presence of guaninethat correspond to that observed for the xpt RNA (FIG. 36B). Theseresults confirm that the purE RNA, like the xpt RNA, respondsallosterically to guanine and not to adenine when incubated in thepresence of the concentrations of ligand tested.

In contrast, all three RNAs that carry the C to U mutation in thejunction between P1 and P3 (corresponding to C74 of xpt) do not respondto guanine, but exhibit structural modulation only when incubated in thepresence of adenine. Furthermore, the patterns of spontaneous cleavagefor the adenine-specific aptamers are consistent with thesecondary-structure model proposed for G box RNAs (FIG. 35). Theseresults indicate that certain variants of the G box class of RNAs serveas sensors of adenine. Furthermore, these findings are consistent withthe hypothesis that, when located in their natural settings, the ydhLRNA from B. subtilis and the two add RNAs from C. perfringens and V.vulnificus serve as adenine-specific riboswitches.

iii. The ydhL Aptamer Binds Adenine with High Affinity and Selectivity

Another characteristic of riboswitches is the aptamer domains exhibittight binding for their corresponding target compound, and theydiscriminate against analogs, in some cases, by orders of magnitude inapparent K_(D). For example, the guanine riboswitch from B. subtilis xptexhibits an apparent K_(D) for guanine of ˜5 nM, but binds adenine withan apparent K_(D) that is at least 100,000-fold poorer. In-line probingassays were used to determine the binding affinities of the B. subtilis80 ydhL RNA for these two purines. As expected, the RNA exhibitsprogressively changing patterns of spontaneous RNA cleavage fragments inthe presence of increasing concentrations of adenine (FIG. 37A), but thepattern remains unchanged with increasing guanine concentrations as highas 10 μM (see below).

The bands corresponding to spontaneous cleavage fragments that undergochange with increasing adenine concentrations were grouped into foursites and the extent of cleavage relative to the total RNA present werequantitated. This data was used to generate a plot (FIG. 37B) thatprovides an estimate of the apparent K_(D) for ligand binding. In thisinstance, half-maximal decrease in spontaneous cleavage at sites 1, 2and 4, and the corresponding half-maximal increase in spontaneouscleavage at site 3 occurs when approximately 300 nM adenine is presentin the in-line probing assay. Thus, the ydhL aptamer binds adenine withan apparent K_(D) that is similar to those exhibited by other classes ofriboswitches.

The molecular recognition characteristics of 80 ydhL were furtherexamined by using the same in-line probing strategy with a variety ofanalogs. For example, a series of purine analogs that are close chemicalvariants to adenine exhibit measurable binding to the RNA (FIG. 38A).The ligands with measurable binding, 2,6-DAP, A and 2-AP, P, MA (listedin order of decreasing affinity), are all close analogs of adenine.Furthermore, the relative affinities of the RNA for various ligandsprovide some indication of the contact points that the aptamer likelyuses to establish molecular recognition (FIG. 38A, bottom right). Thismodel is consistent with the finding that a series of purine analogsfail to exhibit measurable binding to the 80 ydhL RNA (FIG. 38B).

The collection of purines that are recognized by 80 ydhL indicate thatonly the Watson/Crick base-pairing face of the purine ligand isrecognized differently by the ydhL aptamer compared to the xpt aptamer.For example, modification at the C8 position (8-chloroadenine) preventsligand binding, which implies that a steric clash between certainpurines and 80 ydhL as was observed for the xpt aptamer (see Example 6).Interestingly, the fact that 2,6-DAP, and not adenine, is thetightest-binding ligand provides insight into the similarities betweenthe ydhL and xpt aptamers. This observation suggests that the 80 ydhLRNA retains at least one of the two hydrogen bond acceptor contacts thatwere proposed to exist in the xpt aptamer. Thus, the molecularrecognition characteristics of these RNAs are consistent with the ydhLRNA differing in molecular recognition from xpt with a pattern that canbe explained by a change from a Watson/Crick guanine-C base pair in xptto a Watson/Crick adenine-U base pair in ydhL.

iv. Swapping Ligand Specificity of G Box RNAs by Molecular Engineering

The idea that the xpt and ydhL RNAs might be deriving their specificityfor guanine or adenine by a Watson/Crick base pairing interaction wasexamined in greater detail by using a molecular engineering approach. Asimilar approach was used previously (Wilson, K. S. & von Hippel, P. H.Proc. Natl. Acad. Sci. USA 92, 8793-8797) to change the ligand-rescuespecificity of an abasic hammerhead ribozyme construct from guanine toadenine. Both wild-type (93 xpt and 80 ydhL) and mutant (93 xpt C to Uand 80 ydhL U to C) forms of G box aptamers were generated and testedfor binding activity with guanine and adenine (FIG. 39). The mutationscorrespond to nucleotide position 74 relative to the xpt sequence (FIG.35B), which is suspected to be the determinant of moleculardiscrimination between guanine and adenine.

As observed previously (see Example 6), the aptamer based on xptexhibits structural modulation only when incubated in the presence ofguanine, and is able to shift the distribution of tritiated guanine (butnot adenine) in an equilibrium dialysis assay (FIG. 39A). However, the93 xpt RNA that carries a single C to U mutation at position 74 nolonger is responsive to guanine, but exhibits structural modulation andbinding activity during equilibrium dialysis only in the presence ofadenine (FIG. 39B). In contrast, the wild-type 80 ydhL RNA is specificfor adenine (FIG. 39C), while the corresponding U to C mutation at thiscritical nucleotide position alters binding specificity to guanine (FIG.39D). Therefore, the primary determinant of the base specificity of Gbox aptamers is the C or U residue that is present in the junctionbetween stems P1 and P3, and that this base most likely forms aconventional Watson-Crick base pair with its target ligand.

v. Mechanism of Genetic Control by the ydhL Adenine Riboswitch from B.subtilis

In most instances, riboswitches control gene expression in prokaryotesby allosteric interconversion between alternate base-paired structures.For example, a TPP riboswitch from the thiM gene of E. coli makes use ofalternate base pairing to sequester the Shine-Dalgarno sequence of themRNA in the presence of ligand, presumably resulting in reducedtranslation initiation (see Example 2). In contrast, TPP riboswitchesfrom B. subtilis harness ligand-binding events to alter base-pairingpatterns and form intrinsic terminator stems that cause transcriptionelongation to abort (Gusarov, I & Nudler, E. Mol. Cell. 4, 495-504(1999); Mironov, A. S. et al. Cell 111, 747-756 (2002)). Similarly,metabolite-mediated formation of transcription terminator stems is amechanism used by certain examples of riboswitches that respond to FMN(see Example 3 and 6), SAM (see Example 7), guanine (see Example 6), andlysine (see Example 5).

The UTR sequence of the ydhL riboswitch was examined to assess whetherthere is evidence of a transcription termination mechanism. Consistentwith this possibility is the fact that the 5′-UTR of the ydhL mRNA canform a large hairpin, composed of as many as 22 base pairs, followed bya run of eight uridyl residues (FIG. 40A). This structural feature,which was also noted elsewhere recently (Johansen, L. E., et al., J.Bacteriol. 185, 5200-5209), is characteristic of an intrinsic terminatorstem. In the absence of adenine, it was considered that the riboswitchcan form this intrinsic terminator. If true, then the genetic controlstatus for this riboswitch would default to this predicted ‘OFF’ state,which prevents gene expression by inducing transcription termination. Inthe presence of adenine, gene expression is expected to proceed becausea substantial portion of the left shoulder of the terminator stem wouldbe required to form stems P1 and P3 of the adenine aptamer domain. Sincestems P1 and P2 are integral components of the adenine aptamer, ligandbinding would establish a structure that precludes formation of theterminator stem.

This mechanism for the ydhL riboswitch was assessed in vivo bygenerating reporter constructs wherein various forms of guanine- andadenine-specific riboswitches were integrated into the B. subtilisgenome. As controls, two reporter constructs were prepared with eitherthe wild-type xpt riboswitch, or the xpt variant with the C to Umutation at position 74. As expected, the wild-type xpt construct causesrepression of β-galactosidase expression when presented with excessguanine in the culture medium (FIG. 40 b). This finding is similar tothose reported previously for function of the guanine riboswitch fromxpt (see Example 6). Adenine also shows a modest (˜4 fold) repression ofreporter expression after a six-hour incubation. This latter effect ismost likely due to the function of the PurR protein, which is known toprovide modest down-regulation of transcription initiation in responseto adenine at the xpt-pbuX promoter used in this construct (Cristiansen,L. C., et al., J. Bacteriol. 179, 2540-1550 (1997)).

A near identical xpt construct carrying the C to U mutation causes aloss of regulation upon addition of guanine, but shows no change in theputative protein-dependent control due to adenine (FIG. 40C). Theseresults are consistent with the observed loss of guanine binding invitro when this mutation is made, but suggest that the resultingspecificity change to adenine in vitro does not permit robustadenine-dependent genetic control in vivo. Most likely, the diminishedexpression upon addition of adenine again is due to the PurR protein.

In contrast to the xpt riboswitch, the performance of the correspondingwild-type and mutant ydhL reporter constructs indicates that the latteris an adenine-dependent riboswitch with the opposite response to risinglevels of ligand. Specifically, the wild-type ydhL construct exhibitsvery low β-galactosidase activity when assayed in the absence of ligand,or in the presence of guanine (FIG. 40D). However, a greater than10-fold increase in gene expression occurs in response to added adenine.In addition, the single U to C mutation in the P1-P3 junction of theaptamer causes substantial (˜100 fold) derepression regardless of whatligand is used (FIG. 40 e). Although this seems counter to the modelproposed for ydhL riboswitch function, it is important to note that thismutation indeed disrupts adenine binding, but it also causes a mismatchto occur in the terminator stem. If this mismatch is sufficientlydestabilizing to the terminator stem, or if this mutation adverselyaffects the folding pathway for the riboswitch, then the default ‘OFF’status for the genetic control element would be expected to change todefault ‘ON’. Therefore, the observed level of gene expression might beindicative of full activation of the ydhL gene when it's genetic controlelement is indifferent to the concentrations of purines in the cell.

2. Discussion

i. The Structure and Evolution of Adenine Riboswitches

The sequence and biochemical similarities between guanine- andadenine-specific G box RNAs indicate that they are analogous in overallsecondary and tertiary structure. The ease of interchanging ligandspecificities of these aptamers by making single mutations to the xptand ydhL aptamers suggests that such changes might occur with highfrequency in natural populations. However, the fact that neithersingle-base variant of the xpt or ydhL riboswitches exhibitscorresponding specificity changes in genetic control in vivo suggeststhat multiple mutations might be necessary to make a useful swap inriboswitch specificity.

It is important to note that the binding affinity of the resultingsingle-base xpt variant is not as robust for its new ligand.Specifically, the wild-type xpt RNA has an apparent K_(D) for guanine ofno poorer than 5 nM (FIG. 39 a), while the C to U variant of this RNAexhibits an apparent K_(D) for adenine of ˜100 nM (FIG. 39 b). In thiscase, although the mutation results in a substantial change in basediscrimination between guanine and adenine, binding affinity for thematched ligand has been somewhat degraded. In contrast, the wild-typeand mutant ydhL RNAs exhibit both specificity change and retention ofbinding affinity for the matched ligands (FIGS. 39C and 39D). However,the affinity for the U to C variant of 80 ydhL for guanine appears to beat least 10-fold poorer than that of 93 xpt.

Thus, accessory mutations that do not directly define ligand specificitybut that further adjust the binding affinity might be necessary for Gbox RNAs to interconvert between guanine and adenine ligands in abiological setting. In this regard, it is interesting that the ydhL andxpt aptamers differ from each other at 23 positions (FIG. 35), with onlyone residing within an obviously critical position (C74 of xpt).Although some of these mutations might serve to fine-tune the bindingaffinity of the aptamers, many could be the result of neutral drift inthe RNA sequence that is permitted because they retain the essentialsecondary-structure elements.

ii. Genetic Control and Function of the ydhL mRNA

Mutant strains of B. subtilis that resist the toxic effects of2-fluoroadenine were reported recently (Johansen, L. E., et al., J.Bacteriol. 185, 5200-5209)). These mutations, which causeover-expression of the ydhL gene product, were mapped to the adenineriboswitch domain. In both instances, the changes (deletions) areexpected to disrupt riboswitch function by eliminating a portion of theterminator stem or by eliminating both the terminator stem and portionsof the adenine aptamer domain. In both instances, the variants precludethe riboswitch from adopting its default sate (transcriptiontermination), which causes unmodulated activation of gene expression.

The protein product of the ydhL gene (also termed pbuE) has beenproposed to be a purine efflux pump (Johansen, L. E., et al., J.Bacteriol. 185, 5200-5209)). Thus the resistance to 2-fluoroadenineconferred upon the cell by disruption of the adenine riboswitch fromydhL might be due to excretion of this toxic compound. In the naturalgenetic background, the presence of excess adenine within the cell mostlikely induces increased expression of the ydhL gene to produce thepurine efflux protein. Higher levels of this protein then work tonormalize the concentration of purines by pumping out of the cell one ormore forms of this compound class.

iii. Riboswitch Mechanisms—Genetic Activation and Deactivation by RisingMetabolite Concentrations

The adenine riboswitch from B. subtilis also is notable for itsmechanism of action. In the majority of riboswitches examined to date,metabolite binding causes a lowering of gene expression. This occurseither by ligand-mediated formation of a terminator stem to preventtranscription of the complete mRNA, or by sequestering theShine-Dalgarno sequence and precluding translation initiation. In mostinstances, the down-regulation of gene expression is expected, as abuild-up of sufficient levels of a particular metabolite shouldlogically provide a signal to turn off genes in that are involved inbiosynthesis or import of the compound (Grundy, F. J. & Henkin, T. M. etal., Frontiers Biosci. 8, D20-31 (2003)).

The adenine riboswitch from ydhL (and presumably for the addriboswitches as well) belong to a group of genes whose functions wouldhint at the need for riboswitch activation in the presence of highconcentrations of target compounds. In the case of ydhL, disposal ofexcess purines would seem to be an important capability given thatcertain purines such as guanine are insoluble at modest concentrations.Alternatively, there be no obvious need to express adenine deaminase ifadenine concentrations were exceptionally low, and therefore we expectthat the riboswitches from the add genes of C. perfringens and V.vulnificus might be activated by ligand binding as well. Interestingly,T box domains, which are 5′-UTR structures that control the expressionof many aminoacyl-tRNA synthetases in B. subtilis and otherGram-positive organisms (Grundy, F. J., et al., Proc. Natl. Acad. Sci.USA 99, 11121-11126), also induce gene expression in response to risingconcentrations of the target they sense. However, unlike the knownmetabolite-binding riboswitches, T box domains sense the biochemicalprecursor (non-aminoacylated tRNAs) to the products of the enzymes whoseexpression they control (Miller, J. H. A Short Course in BacterialGenetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.(1992)).

Although we expect that riboswitches that induce gene activation inresponse to increasing metabolite will occur less frequently due togenetic necessity, there is no inherent structural flaws in RNA foldingthat would skew this distribution between gene-activating andgene-deactivating riboswitches. Whether the riboswitch responds toligand binding by activating or repressing gene expression, the RNAswill exploit allosteric changes in secondary and/or tertiary structurethat are based on the same principles of RNA folding. The only obligatedifference between activating and repressing riboswitches is in the finestructure of the expression platform, whereas the aptamer domain canremain largely unchanged.

3. Methods

i. Purine Analogs

Guanine, adenine, 2,6-diaminopurine, 2-aminopurine, hypoxanthine,xanthine, 1-methyladenine, purine, 6-methylaminopurine, N⁶—N⁶dimethyladenine, 6-mercaptopurine, 3-methyladenine, guanine-8-³H andadenine-2,8-³H were purchased from Sigma. 6-cyanopurine and 8-azaadeninewere obtained from Aldrich and 2-chloroadenine, 8-chloroadenine fromBiolog Life Science Institute, Germany.

ii. DNA Oligonucleotides

Oligonucleotides were synthesized by the HHMI Keck FoundationBiotechnology Resource Center at Yale University, purified by denaturingpolyacrylamide gel electrophoresis, and were eluted from the gel bycrush-soaking in a buffer containing 10 mM Tris-HCl (pH 7.5 at 23° C.),200 mM NaCl, and 1 mM EDTA. DNAs were precipitation with ethanol,resuspended in deionized water, and stored at −20° C. until use.

iii. In-Line Probing of RNA Constructs

RNA constructs were synthesized from the corresponding PCR DNA templatesby transcription in vitro using T7 RNA polymerase, dephosphorylated, and5′-end labeled with ³²P as described in Example 6. In a typical in-lineprobing assay, 2 nM of labeled RNA were incubated in a buffer containing20 mM MgCl₂, 50 mM Tris-HCl (pH 8.3 at 25° C.) and 100 mM KCl in theabsence or presence of purine compounds as indicated for each experimentfor 40 hrs at 25° C. Purine concentrations ranging from 1 nM to 10 μMwere employed unless otherwise noted. At the end of each incubation,spontaneously cleaved products were separated on a denaturing (8 M urea)10% PAGE, visualized using a PhosphorImager and quantitated usingImageQuaNT software (Molecular Dynamics).

iv. Equilibrium Dialysis

Equilibrium dialysis assays were conducted using a DispoEquilibriumDialyzer (Harvard Biosciences), wherein chamber A and B are separated bya 5,000 MWCO membrane. Chamber A contained 30 μl of ³H-guanine or³H-adenine at a concentration of 100 nM in a buffer containing 50 mMTris-HCl (pH 8.5 at 25° C.), 20 mM MgCl₂, and 100 mM KCl. A 30 μlaliquot of the above mentioned buffer containing RNA at 3 μMconcentration was delivered into chamber B. Equilibrations were allowedto proceed for 10 hrs at 25° C. Subsequently 5 μl was withdrawn fromeach chamber and quantitated by liquid scintillation counting.

v. Construction of xpt- and ydhL-lacZ Fusions

A DNA construct encompassing nucleotides −468 to +9 relative totranslational start site of ydhL was PCR amplified from B. subtilisstrain 1A40 (Bacillus Genetic Stock Center, Columbus, Ohio) with primersthat introduced EcoR1-BamH1 restriction sites. The wild-type constructwas cloned into pDG1661 at EcoR1-BamH1 restriction sites directlyupstream of the lacZ reporter gene and sequenced to confirm itsintegrity. The resulting plasmid was used as a template forsite-directed mutagenesis via the QuickChange site-directed mutagenesiskit (Stratagene) using the appropriate primer. Plasmid variants wereintegrated into the amyE locus of B. subtilis strain 1A40 and thetransformants were confirmed as described in Example 6.

vi. In Vivo Analysis of Riboswitch Function

Transformed B. subtilis cells were grown to mid log phase with constantshaking at 37° C. in minimal media containing 0.4% w/v glucose, 20 g/l(NH₄)₂SO₄, 25 g/l K₂HPO₄, 6 g/l KH₂PO₄, 1 g/l sodium citrate, 0.2 g/LMgSO₄.7H₂O, 0.2% glutamate, 5 μg/ml chloramphenicol, 50 μg/mlL-tryptophan, 50 μg/ml L-lysine and 50 μg/ml L-methionine. Guanine oradenine was added to a final concentration of 0.1 mg/ml. Cells at midexponential stage were harvested and resuspended in minimal media in thepresence or absence of purines and grown for an additional time asindicated for each experiment, at which time 1 ml of cell culture wassubjected to β-galactosidase activity assays using a variation of themethod described by Miller (Miller, J. H. A Short Course in BacterialGenetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.(1992)).

I. Example 9 Tables of Sequence Comparisons for the Sam, Cobalimin,Guanine, Adenine, and Lysine Riboswitches Discussed Herein

FIG. 41 shows sequence and types of riboswitches. The alignment of thesesequences is as disclosed herein, regions disclosed in the other figurescorrespond to the same regions in FIG. 41.

Additional riboswitches were found based on published alignments andsecondary structures (Grundy, F. J. & Henkin, T. M. The S box regulon: anew global transcription termination control system for methionine andcysteine biosynthesis genes in Gram-positive bacteria. Mol. Microbiol.30, 737-749 (1998)) using the SequenceSniffer program. This programfinds degenerate matches to RNA patterns defined by linked sequencemotifs and base pairing constraints. In the alignments, base pairingregions have the identical underline styles or boxes and are labeled asin the corresponding figures discussed in Examples 1-8, with theaddition of a putative pseudoknot marked PS. Predicted terminators(short dashed underline) and start codons (long dashed underline) aremarked for some sequences. Positions for each sequence in the indicatedGenbank record or unfinished genome contig are for the sequence columnmarked with a circle ()—the fifth base in stem P1 that is 5′ of theaptamer. Start is the offset from the column marked with an asterisk(*)—the sixth base in stem P1 that is 3′ of the aptamer—to the startcodon of the first gene in the operon. Genes were identified fromCOGNITOR (Tatusov, R. L., et al. The COG database: new developments inphylogenetic classification of proteins from complete genomes. NucleicAcids Res. 29, 22-28 (2001)) and PFAM (Bateman, A., et al. The PfamProtein Families Database. Nucleic Acids Res. 30, 276-280 (2002))database matches to protein sequences annotated in the Genbank records.The standard names from these databases are used when possible(2011=COG2011; ????=no matches). Previous operon designations for B.subtilis are given in parentheses (Grundy, F. J. & Henkin, T. M. The Sbox regulon: a new global transcription termination control system formethionine and cysteine biosynthesis genes in Gram-positive bacteria.Mol. Microbiol. 30, 737-749 (1998)). A subset of sequences with <90%pairwise identity between the bases encompassed by stem P1 was selectedfor determining the consensus sequence. In the consensus sequence,lowercase and uppercase bases indicate >80% and >95% conservation at aposition, respectively. Purine (R) and pyrimidine (Y) bases wereassigned when no single base had >80% conservation.

(*) Sequence shares >90% identity with another sequence, and wasexcluded when determining the consensus.(1) Very short hypothetical gene that may be a misannotated ORF.(2) Possible S Box “pseudogene”. The S Box is on the opposite strand 5′of the indicated operon.

It is understood that the disclosed method and compositions are notlimited to the particular methodology, protocols, and reagents describedas these may vary. It is also to be understood that the terminology usedherein is for the purpose of describing particular embodiments only, andis not intended to limit the scope of the present invention which willbe limited only by the appended claims.

It must be noted that as used herein and in the appended claims, thesingular forms “a”, “an”, and “the” include plural reference unless thecontext clearly dictates otherwise. Thus, for example, reference to “ariboswitch” includes a plurality of such riboswitches, reference to “theriboswitch” is a reference to one or more riboswitches and equivalentsthereof known to those skilled in the art, and so forth.

“Optional” or “optionally” means that the subsequently described event,circumstance, or material may or may not occur or be present, and thatthe description includes instances where the event, circumstance, ormaterial occurs or is present and instances where it does not occur oris not present.

Ranges may be expressed herein as from “about” one particular value,and/or to “about” another particular value. When such a range isexpressed, also specifically contemplated and considered disclosed isthe range from the one particular value and/or to the other particularvalue unless the context specifically indicates otherwise. Similarly,when values are expressed as approximations, by use of the antecedent“about,” it will be understood that the particular value forms another,specifically contemplated embodiment that should be considered disclosedunless the context specifically indicates otherwise. It will be furtherunderstood that the endpoints of each of the ranges are significant bothin relation to the other endpoint, and independently of the otherendpoint unless the context specifically indicates otherwise. Finally,it should be understood that all of the individual values and sub-rangesof values contained within an explicitly disclosed range are alsospecifically contemplated and should be considered disclosed unless thecontext specifically indicates otherwise. The foregoing appliesregardless of whether in particular cases some or all of theseembodiments are explicitly disclosed.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meanings as commonly understood by one of skill in the artto which the disclosed method and compositions belong. Although anymethods and materials similar or equivalent to those described hereincan be used in the practice or testing of the present method andcompositions, the particularly useful methods, devices, and materialsare as described. Publications cited herein and the material for whichthey are cited are hereby specifically incorporated by reference.Nothing herein is to be construed as an admission that the presentinvention is not entitled to antedate such disclosure by virtue of priorinvention. No admission is made that any reference constitutes priorart. The discussion of references states what their authors assert, andapplicants reserve the right to challenge the accuracy and pertinency ofthe cited documents. It will be clearly understood that, although anumber of publications are referred to herein, such reference does notconstitute an admission that any of these documents forms part of thecommon general knowledge in the art.

Throughout the description and claims of this specification, the word“comprise” and variations of the word, such as “comprising” and“comprises,” means “including but not limited to,” and is not intendedto exclude, for example, other additives, components, integers or steps.

Those skilled in the art will recognize, or be able to ascertain usingno more than routine experimentation, many equivalents to the specificembodiments of the method and compositions described herein. Suchequivalents are intended to be encompassed by the following claims.

1-19. (canceled)
 20. A method of killing or inhibiting the growth of abacterial cell, the method comprising bringing into contact the cell anda compound, wherein the growth of the cell is inhibited, wherein thecompound was identified by testing a compound for inhibition ofexpression of a gene encoding an RNA comprising a riboswitch, whereinthe inhibition is via the riboswitch, wherein if the compound inhibitsexpression of the gene encoding the RNA comprising the riboswitch thenthe compound is identified as a trigger molecule of the riboswitch,wherein the compound inhibits expression of the gene by binding to theriboswitch.
 21. The method of claim 20, wherein the cell is in apatient.
 22. A method of killing or inhibiting the growth of a fungalcell, the method comprising bringing into contact the cell and acompound, wherein the growth of the cell is inhibited, wherein thecompound was identified by testing a compound for inhibition ofexpression of a gene encoding an RNA comprising a riboswitch, whereinthe inhibition is via the riboswitch, wherein if the compound inhibitsexpression of the gene encoding the RNA comprising the riboswitch thenthe compound is identified as a trigger molecule of the riboswitch,wherein the compound inhibits expression of the gene by binding to theriboswitch.
 23. The method of claim 22, wherein the cell is in apatient.
 24. A method of inhibiting cell growth, the method comprisingbringing into contact a cell and a compound, wherein the compound canbind to a riboswitch, wherein the cell comprises a gene encoding an RNAcomprising a riboswitch, wherein the compound inhibits expression of thegene by binding to the riboswitch.
 25. The method of claim 24, whereinthe riboswitch is a guanine-responsive riboswitch.
 26. The method ofclaim 24, wherein the riboswitch is an adenine-responsive riboswitch.27. The method of claim 24, wherein the riboswitch is alysine-responsive riboswitch.
 28. The method of claim 24, wherein theriboswitch is a thiamine pyrophosphate-responsive riboswitch.
 29. Themethod of claim 24, wherein the riboswitch is a flavinmononucleotide-responsive riboswitch.
 30. The method of claim 24,wherein the riboswitch is an S-adenosylmethionine-responsive riboswitch.31. A method of killing or inhibiting the growth of a bacterial cell,the method comprising bringing into contact the cell and a compoundidentified as a trigger molecule of the riboswitch, wherein the growthof the cell is inhibited, wherein the compound identified as a triggermolecule of the riboswitch is produced by testing a test compound forinhibition of gene expression of a gene encoding an RNA comprising ariboswitch, wherein the inhibition is via the riboswitch, wherein if thetest compound that inhibits gene expression of the gene encoding the RNAcomprising the riboswitch the test compound is identified as a triggermolecule of the riboswitch, and producing the test compound.
 32. Themethod of claim 31, wherein the cell is in a patient.
 33. A method ofkilling or inhibiting the growth of a fungal cell, the method comprisingbringing into contact the cell and a compound identified as a triggermolecule of the riboswitch, wherein the growth of the cell is inhibited,wherein the compound is identified as a trigger molecule of theriboswitch is produced by testing a test compound for inhibition of geneexpression of a gene encoding an RNA comprising a riboswitch, whereinthe inhibition is via the riboswitch, wherein if the test compound thatinhibits gene expression of the gene encoding the RNA comprising theriboswitch the test compound is identified as a trigger molecule of theriboswitch, and producing the test compound.
 34. The method of claim 33,wherein the cell is in a patient.